- Kernel - mailweb.openeuler.org

[PATCH OLK-5.10 00/44] xfs: recent patches to fix xfs issues
by Long Li 26 Sep '23

26 Sep '23

Baokun Li (1): xfs: propagate the return value of xfs_log_force() to avoid soft lockup Colin Ian King (2): xfs: remove redundant initializations of pointers drop_leaf and save_leaf xfs: remove redundant pointer lip Darrick J. Wong (9): xfs: use setattr_copy to set vfs inode attributes xfs: remove kmem_zone typedef xfs: rename _zone variables to _cache xfs: compact deferred intent item structures xfs: create slab caches for frequently-used deferred items xfs: rename xfs_bmap_add_free to xfs_free_extent_later xfs: reduce the size of struct xfs_extent_free_item xfs: remove unused parameter from refcount code xfs: pass xfs_extent_free_item directly through the log intent code Dave Chinner (19): xfs: don't assert fail on perag references on teardown xfs: set prealloc flag in xfs_alloc_file_space() xfs: validity check agbnos on the AGFL xfs: validate block number being freed before adding to xefi xfs: don't reverse order of items in bulk AIL insertion xfs: use deferred frees for btree block freeing xfs: pass alloc flags through to xfs_extent_busy_flush() xfs: allow extent free intents to be retried xfs: don't block in busy flushing when freeing extents xfs: journal geometry is not properly bounds checked xfs: AGF length has never been bounds checked xfs: fix bounds check in xfs_defer_agfl_block() xfs: block reservation too large for minleft allocation xfs: punching delalloc extents on write failure is racy xfs: use byte ranges for write cleanup ranges xfs,iomap: move delalloc punching to iomap iomap: buffered write failure should not truncate the page cache xfs: xfs_bmap_punch_delalloc_range() should take a byte range xfs: fix off-by-one-block in xfs_discard_folio() Gaosheng Cui (1): xfs: remove xfs_setattr_time() declaration Guo Xuenan (1): xfs: set minleft correctly for randomly sparse inode allocations Jiapeng Chong (1): xfs: Remove redundant assignment to busy Long Li (6): xfs: fix dir3 block read verify fail during log recover Revert "[Huawei] xfs: propagate the return value of xfs_log_force() to avoid soft lockup" xfs: xfs_trans_cancel() path must check for log shutdown xfs: don't verify agf length when log recovery xfs: shutdown to ensure submits buffers on LSN boundaries xfs: update the last_sync_lsn with ctx start lsn yangerkun (4): xfs: keep growfs sb log item active until ail flush success xfs: fix xfs shutdown since we reserve more blocks in agfl fixup xfs: longest free extent no need consider postalloc xfs: shutdown xfs once inode double free fs/xfs/kmem.h | 4 - fs/xfs/libxfs/xfs_alloc.c | 390 +++++++++++++++++++++-------- fs/xfs/libxfs/xfs_alloc.h | 51 +++- fs/xfs/libxfs/xfs_alloc_btree.c | 2 +- fs/xfs/libxfs/xfs_attr_leaf.c | 2 - fs/xfs/libxfs/xfs_bmap.c | 90 +++---- fs/xfs/libxfs/xfs_bmap.h | 37 +-- fs/xfs/libxfs/xfs_bmap_btree.c | 27 +- fs/xfs/libxfs/xfs_btree.c | 4 +- fs/xfs/libxfs/xfs_btree.h | 2 +- fs/xfs/libxfs/xfs_da_btree.c | 6 +- fs/xfs/libxfs/xfs_da_btree.h | 3 +- fs/xfs/libxfs/xfs_defer.c | 70 +++++- fs/xfs/libxfs/xfs_defer.h | 3 + fs/xfs/libxfs/xfs_ialloc.c | 32 ++- fs/xfs/libxfs/xfs_ialloc_btree.c | 8 +- fs/xfs/libxfs/xfs_inode_fork.c | 4 +- fs/xfs/libxfs/xfs_inode_fork.h | 2 +- fs/xfs/libxfs/xfs_refcount.c | 56 +++-- fs/xfs/libxfs/xfs_refcount.h | 7 +- fs/xfs/libxfs/xfs_refcount_btree.c | 11 +- fs/xfs/libxfs/xfs_rmap.c | 21 +- fs/xfs/libxfs/xfs_rmap.h | 7 +- fs/xfs/libxfs/xfs_rmap_btree.c | 2 +- fs/xfs/libxfs/xfs_sb.c | 56 ++++- fs/xfs/libxfs/xfs_types.c | 23 ++ fs/xfs/libxfs/xfs_types.h | 2 + fs/xfs/xfs_aops.c | 32 +-- fs/xfs/xfs_bmap_item.c | 16 +- fs/xfs/xfs_bmap_item.h | 6 +- fs/xfs/xfs_bmap_util.c | 19 +- fs/xfs/xfs_bmap_util.h | 2 +- fs/xfs/xfs_buf.c | 16 +- fs/xfs/xfs_buf_item.c | 10 +- fs/xfs/xfs_buf_item.h | 11 +- fs/xfs/xfs_buf_item_recover.c | 9 +- fs/xfs/xfs_dquot.c | 26 +- fs/xfs/xfs_extent_busy.c | 36 ++- fs/xfs/xfs_extent_busy.h | 6 +- fs/xfs/xfs_extfree_item.c | 137 +++++++--- fs/xfs/xfs_extfree_item.h | 6 +- fs/xfs/xfs_file.c | 8 - fs/xfs/xfs_icache.c | 8 +- fs/xfs/xfs_icreate_item.c | 6 +- fs/xfs/xfs_icreate_item.h | 2 +- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_inode.h | 2 +- fs/xfs/xfs_inode_item.c | 6 +- fs/xfs/xfs_inode_item.h | 2 +- fs/xfs/xfs_iomap.c | 292 ++++++++++++++++++--- fs/xfs/xfs_iops.c | 56 +---- fs/xfs/xfs_iops.h | 1 - fs/xfs/xfs_log.c | 72 +++--- fs/xfs/xfs_log_priv.h | 2 +- fs/xfs/xfs_log_recover.c | 6 +- fs/xfs/xfs_mount.c | 12 +- fs/xfs/xfs_mru_cache.c | 2 +- fs/xfs/xfs_pnfs.c | 3 +- fs/xfs/xfs_qm.h | 2 +- fs/xfs/xfs_refcount_item.c | 16 +- fs/xfs/xfs_refcount_item.h | 6 +- fs/xfs/xfs_reflink.c | 7 +- fs/xfs/xfs_rmap_item.c | 16 +- fs/xfs/xfs_rmap_item.h | 6 +- fs/xfs/xfs_super.c | 233 ++++++++--------- fs/xfs/xfs_trans.c | 24 +- fs/xfs/xfs_trans.h | 2 +- fs/xfs/xfs_trans_ail.c | 5 +- fs/xfs/xfs_trans_dquot.c | 4 +- 69 files changed, 1357 insertions(+), 700 deletions(-) -- 2.31.1

2 45

[PATCH openEuler-22.03-LTS-SP2] scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
by Yong Hu 26 Sep '23

26 Sep '23

From: Shuchang Li <lishuchang(a)hust.edu.cn> stable inclusion from stable-v5.10.180 commit bab8dc38b1a0a12bc064fc064269033bdcf5b88e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZCDZ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… -------------------------------- [ Upstream commit 91a0c0c1413239d0548b5aac4c82f38f6d53a91e ] When if_type equals zero and pci_resource_start(pdev, PCI_64BIT_BAR4) returns false, drbl_regs_memmap_p is not remapped. This passes a NULL pointer to iounmap(), which can trigger a WARN() on certain arches. When if_type equals six and pci_resource_start(pdev, PCI_64BIT_BAR4) returns true, drbl_regs_memmap_p may has been remapped and ctrl_regs_memmap_p is not remapped. This is a resource leak and passes a NULL pointer to iounmap(). To fix these issues, we need to add null checks before iounmap(), and change some goto labels. Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Shuchang Li <lishuchang(a)hust.edu.cn> Link: https://lore.kernel.org/r/20230404072133.1022-1-lishuchang@hust.edu.cn Reviewed-by: Justin Tee <justin.tee(a)broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yong Hu <yong.hu(a)windriver.com> --- drivers/scsi/lpfc/lpfc_init.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 17200b453cbb..1bb3c96a04bd 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -10477,7 +10477,7 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) goto out_iounmap_all; } else { error = -ENOMEM; - goto out_iounmap_all; + goto out_iounmap_ctrl; } } @@ -10495,7 +10495,7 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) dev_err(&pdev->dev, "ioremap failed for SLI4 HBA dpp registers.\n"); error = -ENOMEM; - goto out_iounmap_ctrl; + goto out_iounmap_all; } phba->pci_bar4_memmap_p = phba->sli4_hba.dpp_regs_memmap_p; } @@ -10520,9 +10520,11 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) return 0; out_iounmap_all: - iounmap(phba->sli4_hba.drbl_regs_memmap_p); + if (phba->sli4_hba.drbl_regs_memmap_p) + iounmap(phba->sli4_hba.drbl_regs_memmap_p); out_iounmap_ctrl: - iounmap(phba->sli4_hba.ctrl_regs_memmap_p); + if (phba->sli4_hba.ctrl_regs_memmap_p) + iounmap(phba->sli4_hba.ctrl_regs_memmap_p); out_iounmap_conf: iounmap(phba->sli4_hba.conf_regs_memmap_p); -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS] scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
by Yong Hu 26 Sep '23

26 Sep '23

From: Shuchang Li <lishuchang(a)hust.edu.cn> stable inclusion from stable-v5.10.180 commit bab8dc38b1a0a12bc064fc064269033bdcf5b88e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZCDZ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… -------------------------------- [ Upstream commit 91a0c0c1413239d0548b5aac4c82f38f6d53a91e ] When if_type equals zero and pci_resource_start(pdev, PCI_64BIT_BAR4) returns false, drbl_regs_memmap_p is not remapped. This passes a NULL pointer to iounmap(), which can trigger a WARN() on certain arches. When if_type equals six and pci_resource_start(pdev, PCI_64BIT_BAR4) returns true, drbl_regs_memmap_p may has been remapped and ctrl_regs_memmap_p is not remapped. This is a resource leak and passes a NULL pointer to iounmap(). To fix these issues, we need to add null checks before iounmap(), and change some goto labels. Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Shuchang Li <lishuchang(a)hust.edu.cn> Link: https://lore.kernel.org/r/20230404072133.1022-1-lishuchang@hust.edu.cn Reviewed-by: Justin Tee <justin.tee(a)broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yong Hu <yong.hu(a)windriver.com> --- drivers/scsi/lpfc/lpfc_init.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 134e4ee5dc48..2f7a17e96e25 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -10474,7 +10474,7 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) goto out_iounmap_all; } else { error = -ENOMEM; - goto out_iounmap_all; + goto out_iounmap_ctrl; } } @@ -10492,7 +10492,7 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) dev_err(&pdev->dev, "ioremap failed for SLI4 HBA dpp registers.\n"); error = -ENOMEM; - goto out_iounmap_ctrl; + goto out_iounmap_all; } phba->pci_bar4_memmap_p = phba->sli4_hba.dpp_regs_memmap_p; } @@ -10517,9 +10517,11 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) return 0; out_iounmap_all: - iounmap(phba->sli4_hba.drbl_regs_memmap_p); + if (phba->sli4_hba.drbl_regs_memmap_p) + iounmap(phba->sli4_hba.drbl_regs_memmap_p); out_iounmap_ctrl: - iounmap(phba->sli4_hba.ctrl_regs_memmap_p); + if (phba->sli4_hba.ctrl_regs_memmap_p) + iounmap(phba->sli4_hba.ctrl_regs_memmap_p); out_iounmap_conf: iounmap(phba->sli4_hba.conf_regs_memmap_p); -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
by Lu Wei 26 Sep '23

26 Sep '23

From: Kyle Zeng <zengyhkyle(a)gmail.com> mainline inclusion from mainline-v6.6-rc1 commit 050d91c03b28ca479df13dfb02bcd2c60dd6a878 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I83QCZ CVE: CVE-2023-42753 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can lead to the use of wrong `CIDR_POS(c)` for calculating array offsets, which can lead to integer underflow. As a result, it leads to slab out-of-bound access. This patch adds back the IP_SET_HASH_WITH_NET0 macro to ip_set_hash_netportnet to address the issue. Fixes: 886503f34d63 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net") Suggested-by: Jozsef Kadlecsik <kadlec(a)netfilter.org> Signed-off-by: Kyle Zeng <zengyhkyle(a)gmail.com> Acked-by: Jozsef Kadlecsik <kadlec(a)netfilter.org> Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Lu Wei <luwei32(a)huawei.com> --- net/netfilter/ipset/ip_set_hash_netportnet.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c index 613e18e720a4..9290a4d7b862 100644 --- a/net/netfilter/ipset/ip_set_hash_netportnet.c +++ b/net/netfilter/ipset/ip_set_hash_netportnet.c @@ -39,6 +39,7 @@ MODULE_ALIAS("ip_set_hash:net,port,net"); #define IP_SET_HASH_WITH_PROTO #define IP_SET_HASH_WITH_NETS #define IPSET_NET_COUNT 2 +#define IP_SET_HASH_WITH_NET0 /* IPv4 variant */ -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
by Lu Wei 26 Sep '23

26 Sep '23

From: Kyle Zeng <zengyhkyle(a)gmail.com> mainline inclusion from mainline-v6.6-rc1 commit 050d91c03b28ca479df13dfb02bcd2c60dd6a878 category: bugfix bugzilla: 189250 CVE: CVE-2023-42753 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can lead to the use of wrong `CIDR_POS(c)` for calculating array offsets, which can lead to integer underflow. As a result, it leads to slab out-of-bound access. This patch adds back the IP_SET_HASH_WITH_NET0 macro to ip_set_hash_netportnet to address the issue. Fixes: 886503f34d63 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net") Suggested-by: Jozsef Kadlecsik <kadlec(a)netfilter.org> Signed-off-by: Kyle Zeng <zengyhkyle(a)gmail.com> Acked-by: Jozsef Kadlecsik <kadlec(a)netfilter.org> Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Lu Wei <luwei32(a)huawei.com> --- net/netfilter/ipset/ip_set_hash_netportnet.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c index 613e18e720a4..9290a4d7b862 100644 --- a/net/netfilter/ipset/ip_set_hash_netportnet.c +++ b/net/netfilter/ipset/ip_set_hash_netportnet.c @@ -39,6 +39,7 @@ MODULE_ALIAS("ip_set_hash:net,port,net"); #define IP_SET_HASH_WITH_PROTO #define IP_SET_HASH_WITH_NETS #define IPSET_NET_COUNT 2 +#define IP_SET_HASH_WITH_NET0 /* IPv4 variant */ -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS,v2] netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
by Lu Wei 26 Sep '23

26 Sep '23

From: Kyle Zeng <zengyhkyle(a)gmail.com> mainline inclusion from mainline-v6.6-rc1 commit 050d91c03b28ca479df13dfb02bcd2c60dd6a878 category: bugfix bugzilla: 189250 CVE: CVE-2023-42753 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can lead to the use of wrong `CIDR_POS(c)` for calculating array offsets, which can lead to integer underflow. As a result, it leads to slab out-of-bound access. This patch adds back the IP_SET_HASH_WITH_NET0 macro to ip_set_hash_netportnet to address the issue. Fixes: 886503f34d63 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net") Suggested-by: Jozsef Kadlecsik <kadlec(a)netfilter.org> Signed-off-by: Kyle Zeng <zengyhkyle(a)gmail.com> Acked-by: Jozsef Kadlecsik <kadlec(a)netfilter.org> Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Lu Wei <luwei32(a)huawei.com> --- net/netfilter/ipset/ip_set_hash_netportnet.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c index 613e18e720a4..9290a4d7b862 100644 --- a/net/netfilter/ipset/ip_set_hash_netportnet.c +++ b/net/netfilter/ipset/ip_set_hash_netportnet.c @@ -39,6 +39,7 @@ MODULE_ALIAS("ip_set_hash:net,port,net"); #define IP_SET_HASH_WITH_PROTO #define IP_SET_HASH_WITH_NETS #define IPSET_NET_COUNT 2 +#define IP_SET_HASH_WITH_NET0 /* IPv4 variant */ -- 2.34.1

1 0

[PATCH openEuler-1.0-LTS] [just for review!!!!]Add feature: eNFS - nfs multipath to improve performance and reliability
by mingqian218472 25 Sep '23

25 Sep '23

From: 闫海涛 <yanhaitao2(a)huawei.com> driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7SVH7 --------------------------------- Currently, the NFS client can use only one server IP address at a single mount point. As a result, the hardware capability of multiple storage nodes and NICs cannot be fully utilized. In multiple financial sites, the performance cannot meet service requirements. In addition, when a single link is faulty, services are suspended. The reliability problem needs to be solved. OpenEuler-based commercial OS vendors hope that the eNFS feature will be integrated into 20.03 SP4 to resolve performance and reliability problems. When user mount one NFS share, can input localaddrs/remoteaddrs these two optional Parameters to use eNFS multipath. If these optional parameters are not used, NFS will behave as before. For example, mount -t nfs -o [localaddrs=127.17.0.1-127.17.0.4],[remoteaddrs=127.17.1.1-127.17.1.4] xx.xx.xx.xx:/test /mnt/test Changes in eNFS are as follows: 1. patch 0001: At the NFS layer, the eNFS registration function is called back when the mount command parses parameters. The eNFS parses and saves the IP address list entered by users. 2. patch 0002: At the sunrpc layer, the eNFS registration function is called back When the NFS uses sunrpc to create rpc_clnt, the eNFS combines the IP address list entered for mount to generate multiple xprts. When the I/O times out, the callback function of the eNFS is called back so that the eNFS switches to an available link for retry. 3. patch 0003: The eNFS module registers the interface for parsing the mount command. During the mount process, the NFS invokes the eNFS interface to enable the eNFS to parse the mounting parameters of UltraPath. The eNFS module saves the mounting parameters to the context of nfs_client. 4. patch 0004: When the NFS invokes the SunRPC to create rpc_clnt, the eNFS interface is called back. The eNFS creates multiple xprts based on the output IP address list. When NFS V3 I/Os are delivered, eNFS distributes I/Os to available links based on the link status, improving performance through load balancing. 5. patch 0005: When sending I/Os from the SunRPC module to the NFS server times out, the SunRPC module calls back the eNFS module to reselect a link. The eNFS module distributes I/Os to other available links, preventing service interruption caused by a single link failure. 6. patch 0006: The eNFS compilation option and makefile are added. By default, the eNFS compilation is not performed. Signed-off-by: mingqian218472 <zhangmingqian.zhang(a)huawei.com> --- ...-nfs-multipath-to-improve-performanc.patch | 6148 +++++++++++++++++ ...enfs_registe_and_handle_mount_option.patch | 757 ++ ...nd_create_multipath_then_dispatch_IO.patch | 805 +++ ...add_enfs_module_for_nfs_mount_option.patch | 1209 ++++ ...dd_enfs_module_for_sunrpc_multipatch.patch | 1581 +++++ ...le_for_sunrpc_failover_and_configure.patch | 1607 +++++ 0006-add_enfs_compile_option.patch | 70 + 7 files changed, 12177 insertions(+) create mode 100644 0001-Add-feature-eNFS-nfs-multipath-to-improve-performanc.patch create mode 100644 0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch create mode 100644 0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch create mode 100644 0003-add_enfs_module_for_nfs_mount_option.patch create mode 100644 0004-add_enfs_module_for_sunrpc_multipatch.patch create mode 100644 0005-add_enfs_module_for_sunrpc_failover_and_configure.patch create mode 100644 0006-add_enfs_compile_option.patch diff --git a/0001-Add-feature-eNFS-nfs-multipath-to-improve-performanc.patch b/0001-Add-feature-eNFS-nfs-multipath-to-improve-performanc.patch new file mode 100644 index 0000000..2974c5f --- /dev/null +++ b/0001-Add-feature-eNFS-nfs-multipath-to-improve-performanc.patch @@ -0,0 +1,6148 @@ +From 53f616b0a649494e33d30b250d06c4049ccb88be Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?=E9=97=AB=E6=B5=B7=E6=B6=9B?= <yanhaitao2(a)huawei.com> +Date: Mon, 25 Sep 2023 19:19:15 +0800 +Subject: [PATCH openEuler-20.03-LTS-SP3] Add feature: eNFS - nfs multipath to + improve performance and reliability + +driver inclusion +category: feature +bugzilla: https://gitee.com/openeuler/release-management/issues/I7U0W0 + +--------------------------------- + +Currently, the NFS client can use only one server IP address at a single mount point. As a result, the hardware capability of multiple storage nodes and NICs cannot be fully utilized. In multiple financial sites, the performance cannot meet service requirements. In addition, when a single link is faulty, services are suspended. The reliability problem needs to be solved. +OpenEuler-based commercial OS vendors hope that the eNFS feature will be integrated into 20.03 SP4 to resolve performance and reliability problems. + +When user mount one NFS share, can input localaddrs/remoteaddrs these two optional Parameters to use eNFS multipath. If these optional parameters are not used, NFS will behave as before. For example, +mount -t nfs -o [localaddrs=127.17.0.1-127.17.0.4],[remoteaddrs=127.17.1.1-127.17.1.4] xx.xx.xx.xx:/test /mnt/test + +Changes in eNFS are as follows: +1. patch 0001: +At the NFS layer, the eNFS registration function is called back when the mount command parses parameters. The eNFS parses and saves the IP address list entered by users. +2. patch 0002: +At the sunrpc layer, the eNFS registration function is called back When the NFS uses sunrpc to create rpc_clnt, the eNFS combines the IP address list entered for mount to generate multiple xprts. When the I/O times out, the callback function of the eNFS is called back so that the eNFS switches to an available link for retry. +3. patch 0003: +The eNFS module registers the interface for parsing the mount command. During the mount process, the NFS invokes the eNFS interface to enable the eNFS to parse the mounting parameters of UltraPath. The eNFS module saves the mounting parameters to the context of nfs_client. +4. patch 0004: +When the NFS invokes the SunRPC to create rpc_clnt, the eNFS interface is called back. The eNFS creates multiple xprts based on the output IP address list. When NFS V3 I/Os are delivered, eNFS distributes I/Os to available links based on the link status, improving performance through load balancing. +5. patch 0005: +When sending I/Os from the SunRPC module to the NFS server times out, the SunRPC module calls back the eNFS module to reselect a link. The eNFS module distributes I/Os to other available links, preventing service interruption caused by a single link failure. +6. patch 0006: +The eNFS compilation option and makefile are added. By default, the eNFS compilation is not performed. + +Signed-off-by: mingqian218472 <zhangmingqian.zhang(a)huawei.com> +--- + ...enfs_registe_and_handle_mount_option.patch | 757 ++++++++ + ...nd_create_multipath_then_dispatch_IO.patch | 805 +++++++++ + ...add_enfs_module_for_nfs_mount_option.patch | 1209 +++++++++++++ + ...dd_enfs_module_for_sunrpc_multipatch.patch | 1581 ++++++++++++++++ + ...le_for_sunrpc_failover_and_configure.patch | 1607 +++++++++++++++++ + 0006-add_enfs_compile_option.patch | 70 + + kernel.spec | 13 + + 7 files changed, 6042 insertions(+) + create mode 100644 0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch + create mode 100644 0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch + create mode 100644 0003-add_enfs_module_for_nfs_mount_option.patch + create mode 100644 0004-add_enfs_module_for_sunrpc_multipatch.patch + create mode 100644 0005-add_enfs_module_for_sunrpc_failover_and_configure.patch + create mode 100644 0006-add_enfs_compile_option.patch + +diff --git a/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch b/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch +new file mode 100644 +index 0000000..38e57a9 +--- /dev/null ++++ b/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch +@@ -0,0 +1,757 @@ ++diff --git a/fs/nfs/client.c b/fs/nfs/client.c ++index 7d02dc52209d..50820a8a684a 100644 ++--- a/fs/nfs/client.c +++++ b/fs/nfs/client.c ++@@ -48,7 +48,7 @@ ++ #include "callback.h" ++ #include "delegation.h" ++ #include "iostat.h" ++-#include "internal.h" +++#include "enfs_adapter.h" ++ #include "fscache.h" ++ #include "pnfs.h" ++ #include "nfs.h" ++@@ -255,6 +255,7 @@ void nfs_free_client(struct nfs_client *clp) ++ put_nfs_version(clp->cl_nfs_mod); ++ kfree(clp->cl_hostname); ++ kfree(clp->cl_acceptor); +++ nfs_free_multi_path_client(clp); ++ kfree(clp); ++ } ++ EXPORT_SYMBOL_GPL(nfs_free_client); ++@@ -330,6 +331,9 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat ++ sap)) ++ continue; ++ +++ if (!nfs_multipath_client_match(clp, data)) +++ continue; +++ ++ refcount_inc(&clp->cl_count); ++ return clp; ++ } ++@@ -512,6 +516,9 @@ int nfs_create_rpc_client(struct nfs_client *clp, ++ .program = &nfs_program, ++ .version = clp->rpc_ops->version, ++ .authflavor = flavor, +++#if IS_ENABLED(CONFIG_ENFS) +++ .multipath_option = cl_init->enfs_option, +++#endif ++ }; ++ ++ if (test_bit(NFS_CS_DISCRTRY, &clp->cl_flags)) ++@@ -634,6 +641,13 @@ struct nfs_client *nfs_init_client(struct nfs_client *clp, ++ /* the client is already initialised */ ++ if (clp->cl_cons_state == NFS_CS_READY) ++ return clp; +++ error = nfs_create_multi_path_client(clp, cl_init); +++ if (error < 0) { +++ dprintk("%s: create failed.%d!\n", __func__, error); +++ nfs_put_client(clp); +++ clp = ERR_PTR(error); +++ return clp; +++ } ++ ++ /* ++ * Create a client RPC handle for doing FSSTAT with UNIX auth only ++@@ -666,6 +680,9 @@ static int nfs_init_server(struct nfs_server *server, ++ .net = data->net, ++ .timeparms = &timeparms, ++ .init_flags = (1UL << NFS_CS_REUSEPORT), +++#if IS_ENABLED(CONFIG_ENFS) +++ .enfs_option = data->enfs_option, +++#endif ++ }; ++ struct nfs_client *clp; ++ int error; ++diff --git a/fs/nfs/enfs_adapter.c b/fs/nfs/enfs_adapter.c ++new file mode 100644 ++index 000000000000..7f471f2072c4 ++--- /dev/null +++++ b/fs/nfs/enfs_adapter.c ++@@ -0,0 +1,230 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/types.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs3.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include <linux/sunrpc/sched.h> +++#include <linux/nfs_iostat.h> +++#include "enfs_adapter.h" +++#include "iostat.h" +++ +++struct enfs_adapter_ops __rcu *enfs_adapter; +++ +++int enfs_adapter_register(struct enfs_adapter_ops *ops) +++{ +++ struct enfs_adapter_ops *old; +++ +++ old = cmpxchg((struct enfs_adapter_ops **)&enfs_adapter, NULL, ops); +++ if (old == NULL || old == ops) +++ return 0; +++ pr_err("regist %s ops %p failed. old %p\n", __func__, ops, old); +++ return -EPERM; +++} +++EXPORT_SYMBOL_GPL(enfs_adapter_register); +++ +++int enfs_adapter_unregister(struct enfs_adapter_ops *ops) +++{ +++ struct enfs_adapter_ops *old; +++ +++ old = cmpxchg((struct enfs_adapter_ops **)&enfs_adapter, ops, NULL); +++ if (old == ops || old == NULL) +++ return 0; +++ pr_err("unregist %s ops %p failed. old %p\n", __func__, ops, old); +++ return -EPERM; +++} +++EXPORT_SYMBOL_GPL(enfs_adapter_unregister); +++ +++struct enfs_adapter_ops *nfs_multipath_router_get(void) +++{ +++ struct enfs_adapter_ops *ops; +++ +++ rcu_read_lock(); +++ ops = rcu_dereference(enfs_adapter); +++ if (ops == NULL) { +++ rcu_read_unlock(); +++ return NULL; +++ } +++ if (!try_module_get(ops->owner)) +++ ops = NULL; +++ rcu_read_unlock(); +++ return ops; +++} +++ +++void nfs_multipath_router_put(struct enfs_adapter_ops *ops) +++{ +++ if (ops) +++ module_put(ops->owner); +++} +++ +++bool is_valid_option(enum nfsmultipathoptions option) +++{ +++ if (option < REMOTEADDR || option >= INVALID_OPTION) { +++ pr_warn("%s: ENFS invalid option %d\n", __func__, option); +++ return false; +++ } +++ +++ return true; +++} +++ +++int enfs_parse_mount_options(enum nfsmultipathoptions option, char *str, +++ struct nfs_parsed_mount_data *mnt) +++{ +++ +++ //parseMultiPathOptions(getNfsMultiPathOpt(token), string, mnt); +++ +++ int rc; +++ struct enfs_adapter_ops *ops; +++ +++ ops = nfs_multipath_router_get(); +++ if ((ops == NULL) || (ops->parse_mount_options == NULL) || +++ !is_valid_option(option)) { +++ nfs_multipath_router_put(ops); +++ dfprintk(MOUNT, +++ "NFS: parsing nfs mount option enfs not load[%s]\n" +++ , __func__); +++ return -EOPNOTSUPP; +++ } +++ // nfs_multipath_parse_options +++ dfprintk(MOUNT, "NFS: parsing nfs mount option '%s' type: %d[%s]\n" +++ , str, option, __func__); +++ rc = ops->parse_mount_options(option, str, &mnt->enfs_option, mnt->net); +++ nfs_multipath_router_put(ops); +++ return rc; +++} +++ +++void enfs_free_mount_options(struct nfs_parsed_mount_data *data) +++{ +++ struct enfs_adapter_ops *ops; +++ +++ if (data->enfs_option == NULL) +++ return; +++ +++ ops = nfs_multipath_router_get(); +++ if ((ops == NULL) || (ops->free_mount_options == NULL)) { +++ nfs_multipath_router_put(ops); +++ return; +++ } +++ ops->free_mount_options((void *)&data->enfs_option); +++ nfs_multipath_router_put(ops); +++} +++ +++int nfs_create_multi_path_client(struct nfs_client *client, +++ const struct nfs_client_initdata *cl_init) +++{ +++ int ret = 0; +++ struct enfs_adapter_ops *ops; +++ +++ if (cl_init->enfs_option == NULL) +++ return 0; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->client_info_init != NULL) +++ ret = ops->client_info_init( +++ (void *)&client->cl_multipath_data, cl_init); +++ nfs_multipath_router_put(ops); +++ +++ return ret; +++} +++EXPORT_SYMBOL_GPL(nfs_create_multi_path_client); +++ +++void nfs_free_multi_path_client(struct nfs_client *clp) +++{ +++ struct enfs_adapter_ops *ops; +++ +++ if (clp->cl_multipath_data == NULL) +++ return; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->client_info_free != NULL) +++ ops->client_info_free(clp->cl_multipath_data); +++ nfs_multipath_router_put(ops); +++} +++ +++int nfs_multipath_client_match(struct nfs_client *clp, +++ const struct nfs_client_initdata *sap) +++{ +++ int ret = true; +++ struct enfs_adapter_ops *ops; +++ +++ pr_info("%s src %p dst %p\n.", __func__, +++ clp->cl_multipath_data, sap->enfs_option); +++ +++ if (clp->cl_multipath_data == NULL && sap->enfs_option == NULL) +++ return true; +++ +++ if ((clp->cl_multipath_data == NULL && sap->enfs_option) || +++ (clp->cl_multipath_data && sap->enfs_option == NULL)) { +++ pr_err("not match client src %p dst %p\n.", +++ clp->cl_multipath_data, sap->enfs_option); +++ return false; +++ } +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->client_info_match != NULL) +++ ret = ops->client_info_match(clp->cl_multipath_data, +++ sap->enfs_option); +++ nfs_multipath_router_put(ops); +++ +++ return ret; +++} +++ +++int nfs4_multipath_client_match(struct nfs_client *src, struct nfs_client *dst) +++{ +++ int ret = true; +++ struct enfs_adapter_ops *ops; +++ +++ if (src->cl_multipath_data == NULL && dst->cl_multipath_data == NULL) +++ return true; +++ +++ if (src->cl_multipath_data == NULL || dst->cl_multipath_data == NULL) +++ return false; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->nfs4_client_info_match != NULL) +++ ret = ops->nfs4_client_info_match(src->cl_multipath_data, +++ src->cl_multipath_data); +++ nfs_multipath_router_put(ops); +++ +++ return ret; +++} +++EXPORT_SYMBOL_GPL(nfs4_multipath_client_match); +++ +++void nfs_multipath_show_client_info(struct seq_file *mount_option, +++ struct nfs_server *server) +++{ +++ struct enfs_adapter_ops *ops; +++ +++ if (mount_option == NULL || server == NULL || +++ server->client == NULL || +++ server->nfs_client->cl_multipath_data == NULL) +++ return; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->client_info_show != NULL) +++ ops->client_info_show(mount_option, server); +++ nfs_multipath_router_put(ops); +++} +++ +++int nfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option) +++{ +++ int ret = 0; +++ struct enfs_adapter_ops *ops; +++ +++ if (nfs_client == NULL || nfs_client->cl_rpcclient == NULL) +++ return 0; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->remount_ip_list != NULL) +++ ret = ops->remount_ip_list(nfs_client, enfs_option); +++ nfs_multipath_router_put(ops); +++ return ret; +++} +++EXPORT_SYMBOL_GPL(nfs_remount_iplist); ++diff --git a/fs/nfs/enfs_adapter.h b/fs/nfs/enfs_adapter.h ++new file mode 100644 ++index 000000000000..752544e18056 ++--- /dev/null +++++ b/fs/nfs/enfs_adapter.h ++@@ -0,0 +1,101 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS adapt header. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _NFS_MULTIPATH_H_ +++#define _NFS_MULTIPATH_H_ +++ +++#include "internal.h" +++ +++#if IS_ENABLED(CONFIG_ENFS) +++enum nfsmultipathoptions { +++ REMOTEADDR, +++ LOCALADDR, +++ REMOTEDNSNAME, +++ REMOUNTREMOTEADDR, +++ REMOUNTLOCALADDR, +++ INVALID_OPTION +++}; +++ +++ +++struct enfs_adapter_ops { +++ const char *name; +++ struct module *owner; +++ int (*parse_mount_options)(enum nfsmultipathoptions option, +++ char *str, void **enfs_option, struct net *net_ns); +++ +++ void (*free_mount_options)(void **data); +++ +++ int (*client_info_init)(void **data, +++ const struct nfs_client_initdata *cl_init); +++ void (*client_info_free)(void *data); +++ int (*client_info_match)(void *src, void *dst); +++ int (*nfs4_client_info_match)(void *src, void *dst); +++ void (*client_info_show)(struct seq_file *mount_option, void *data); +++ int (*remount_ip_list)(struct nfs_client *nfs_client, +++ void *enfs_option); +++}; +++ +++int enfs_parse_mount_options(enum nfsmultipathoptions option, char *str, +++ struct nfs_parsed_mount_data *mnt); +++void enfs_free_mount_options(struct nfs_parsed_mount_data *data); +++int nfs_create_multi_path_client(struct nfs_client *client, +++ const struct nfs_client_initdata *cl_init); +++void nfs_free_multi_path_client(struct nfs_client *clp); +++int nfs_multipath_client_match(struct nfs_client *clp, +++ const struct nfs_client_initdata *sap); +++int nfs4_multipath_client_match(struct nfs_client *src, struct nfs_client *dst); +++void nfs_multipath_show_client_info(struct seq_file *mount_option, +++ struct nfs_server *server); +++int enfs_adapter_register(struct enfs_adapter_ops *ops); +++int enfs_adapter_unregister(struct enfs_adapter_ops *ops); +++int nfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option); +++int nfs4_create_multi_path(struct nfs_server *server, +++ struct nfs_parsed_mount_data *data, +++ const struct rpc_timeout *timeparms); +++ +++#else +++static inline +++void nfs_free_multi_path_client(struct nfs_client *clp) +++{ +++ +++} +++ +++static inline +++int nfs_multipath_client_match(struct nfs_client *clp, +++ const struct nfs_client_initdata *sap) +++{ +++ return 1; +++} +++ +++static inline +++int nfs_create_multi_path_client(struct nfs_client *client, +++ const struct nfs_client_initdata *cl_init) +++{ +++ return 0; +++} +++ +++static inline +++void nfs_multipath_show_client_info(struct seq_file *mount_option, +++ struct nfs_server *server) +++{ +++ +++} +++ +++static inline +++int nfs4_multipath_client_match(struct nfs_client *src, +++ struct nfs_client *dst) +++{ +++ return 1; +++} +++ +++static inline +++void enfs_free_mount_options(struct nfs_parsed_mount_data *data) +++{ +++ +++} +++ +++#endif // CONFIG_ENFS +++#endif // _NFS_MULTIPATH_H_ ++diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h ++index 0ce5a90640c4..c696693edc7b 100644 ++--- a/fs/nfs/internal.h +++++ b/fs/nfs/internal.h ++@@ -93,6 +93,9 @@ struct nfs_client_initdata { ++ u32 minorversion; ++ struct net *net; ++ const struct rpc_timeout *timeparms; +++#if IS_ENABLED(CONFIG_ENFS) +++ void *enfs_option; /* struct multipath_mount_options * */ +++#endif ++ }; ++ ++ /* ++@@ -135,6 +138,9 @@ struct nfs_parsed_mount_data { ++ ++ struct security_mnt_opts lsm_opts; ++ struct net *net; +++#if IS_ENABLED(CONFIG_ENFS) +++ void *enfs_option; /* struct multipath_mount_options * */ +++#endif ++ }; ++ ++ /* mount_clnt.c */ ++diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c ++index 1350ea673672..4aa6e1f961f7 100644 ++--- a/fs/nfs/nfs4client.c +++++ b/fs/nfs/nfs4client.c ++@@ -10,7 +10,7 @@ ++ #include <linux/sunrpc/xprt.h> ++ #include <linux/sunrpc/bc_xprt.h> ++ #include <linux/sunrpc/rpc_pipe_fs.h> ++-#include "internal.h" +++#include "enfs_adapter.h" ++ #include "callback.h" ++ #include "delegation.h" ++ #include "nfs4session.h" ++@@ -225,6 +225,16 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init) ++ __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags); ++ __set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags); ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ err = nfs_create_multi_path_client(clp, cl_init); +++ if (err < 0) { +++ dprintk("%s: create failed.%d\n", __func__, err); +++ nfs_put_client(clp); +++ clp = ERR_PTR(err); +++ return clp; +++ } +++#endif +++ ++ /* ++ * Set up the connection to the server before we add add to the ++ * global list. ++@@ -529,6 +539,9 @@ static int nfs4_match_client(struct nfs_client *pos, struct nfs_client *new, ++ if (!nfs4_match_client_owner_id(pos, new)) ++ return 1; ++ +++ if (!nfs4_multipath_client_match(pos, new)) +++ return 1; +++ ++ return 0; ++ } ++ ++@@ -860,7 +873,7 @@ static int nfs4_set_client(struct nfs_server *server, ++ const size_t addrlen, ++ const char *ip_addr, ++ int proto, const struct rpc_timeout *timeparms, ++- u32 minorversion, struct net *net) +++ u32 minorversion, struct net *net, void *enfs_option) ++ { ++ struct nfs_client_initdata cl_init = { ++ .hostname = hostname, ++@@ -872,6 +885,9 @@ static int nfs4_set_client(struct nfs_server *server, ++ .minorversion = minorversion, ++ .net = net, ++ .timeparms = timeparms, +++#if IS_ENABLED(CONFIG_ENFS) +++ .enfs_option = enfs_option, +++#endif ++ }; ++ struct nfs_client *clp; ++ ++@@ -1042,6 +1058,30 @@ static int nfs4_server_common_setup(struct nfs_server *server, ++ return error; ++ } ++ +++int nfs4_create_multi_path(struct nfs_server *server, +++ struct nfs_parsed_mount_data *data, +++ const struct rpc_timeout *timeparms) +++{ +++ struct nfs_client_initdata cl_init = { +++ .hostname = data->nfs_server.hostname, +++ .addr = (const struct sockaddr *)&data->nfs_server.address, +++ .addrlen = data->nfs_server.addrlen, +++ .ip_addr = data->client_address, +++ .nfs_mod = &nfs_v4, +++ .proto = data->nfs_server.protocol, +++ .minorversion = data->minorversion, +++ .net = data->net, +++ .timeparms = timeparms, +++#if IS_ENABLED(CONFIG_ENFS) +++ .enfs_option = data->enfs_option, +++#endif // CONFIG_ENFS +++ }; +++ +++ return nfs_create_multi_path_client(server->nfs_client, &cl_init); +++ +++} +++EXPORT_SYMBOL_GPL(nfs4_create_multi_path); +++ ++ /* ++ * Create a version 4 volume record ++ */ ++@@ -1050,6 +1090,7 @@ static int nfs4_init_server(struct nfs_server *server, ++ { ++ struct rpc_timeout timeparms; ++ int error; +++ void *enfs_option = NULL; ++ ++ nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, ++ data->timeo, data->retrans); ++@@ -1067,6 +1108,10 @@ static int nfs4_init_server(struct nfs_server *server, ++ else ++ data->selected_flavor = RPC_AUTH_UNIX; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ enfs_option = data->enfs_option; +++#endif +++ ++ /* Get a client record */ ++ error = nfs4_set_client(server, ++ data->nfs_server.hostname, ++@@ -1076,7 +1121,7 @@ static int nfs4_init_server(struct nfs_server *server, ++ data->nfs_server.protocol, ++ &timeparms, ++ data->minorversion, ++- data->net); +++ data->net, enfs_option); ++ if (error < 0) ++ return error; ++ ++@@ -1161,7 +1206,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, ++ XPRT_TRANSPORT_RDMA, ++ parent_server->client->cl_timeout, ++ parent_client->cl_mvops->minor_version, ++- parent_client->cl_net); +++ parent_client->cl_net, NULL); ++ if (!error) ++ goto init_server; ++ #endif /* IS_ENABLED(CONFIG_SUNRPC_XPRT_RDMA) */ ++@@ -1174,7 +1219,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, ++ XPRT_TRANSPORT_TCP, ++ parent_server->client->cl_timeout, ++ parent_client->cl_mvops->minor_version, ++- parent_client->cl_net); +++ parent_client->cl_net, NULL); ++ if (error < 0) ++ goto error; ++ ++@@ -1269,7 +1314,7 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname, ++ set_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status); ++ error = nfs4_set_client(server, hostname, sap, salen, buf, ++ clp->cl_proto, clnt->cl_timeout, ++- clp->cl_minorversion, net); +++ clp->cl_minorversion, net, NULL); ++ clear_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status); ++ if (error != 0) { ++ nfs_server_insert_lists(server); ++diff --git a/fs/nfs/super.c b/fs/nfs/super.c ++index a05e1eb2c3fd..83cd294aca15 100644 ++--- a/fs/nfs/super.c +++++ b/fs/nfs/super.c ++@@ -61,7 +61,7 @@ ++ #include "callback.h" ++ #include "delegation.h" ++ #include "iostat.h" ++-#include "internal.h" +++#include "enfs_adapter.h" ++ #include "fscache.h" ++ #include "nfs4session.h" ++ #include "pnfs.h" ++@@ -113,6 +113,12 @@ enum { ++ ++ /* Special mount options */ ++ Opt_userspace, Opt_deprecated, Opt_sloppy, +++#if IS_ENABLED(CONFIG_ENFS) +++ Opt_remote_iplist, +++ Opt_local_iplist, +++ Opt_remote_dnslist, +++ Opt_enfs_info, +++#endif ++ ++ Opt_err ++ }; ++@@ -183,6 +189,13 @@ static const match_table_t nfs_mount_option_tokens = { ++ { Opt_fscache_uniq, "fsc=%s" }, ++ { Opt_local_lock, "local_lock=%s" }, ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ { Opt_remote_iplist, "remoteaddrs=%s" }, +++ { Opt_local_iplist, "localaddrs=%s" }, +++ { Opt_remote_dnslist, "remotednsname=%s" }, +++ { Opt_enfs_info, "enfs_info=%s" }, +++#endif +++ ++ /* The following needs to be listed after all other options */ ++ { Opt_nfsvers, "v%s" }, ++ ++@@ -365,6 +378,21 @@ static struct shrinker acl_shrinker = { ++ .seeks = DEFAULT_SEEKS, ++ }; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++enum nfsmultipathoptions getNfsMultiPathOpt(int token) +++{ +++ switch (token) { +++ case Opt_remote_iplist: +++ return REMOUNTREMOTEADDR; +++ case Opt_local_iplist: +++ return REMOUNTLOCALADDR; +++ case Opt_remote_dnslist: +++ return REMOTEDNSNAME; +++ } +++ return INVALID_OPTION; +++} +++#endif +++ ++ /* ++ * Register the NFS filesystems ++ */ ++@@ -758,6 +786,9 @@ int nfs_show_options(struct seq_file *m, struct dentry *root) ++ seq_printf(m, ",addr=%s", ++ rpc_peeraddr2str(nfss->nfs_client->cl_rpcclient, ++ RPC_DISPLAY_ADDR)); +++ +++ nfs_multipath_show_client_info(m, nfss); +++ ++ rcu_read_unlock(); ++ ++ return 0; ++@@ -853,6 +884,8 @@ int nfs_show_stats(struct seq_file *m, struct dentry *root) ++ seq_puts(m, root->d_sb->s_flags & SB_NODIRATIME ? ",nodiratime" : ""); ++ nfs_show_mount_options(m, nfss, 1); ++ +++ nfs_multipath_show_client_info(m, nfss); +++ ++ seq_printf(m, "\n\tage:\t%lu", (jiffies - nfss->mount_time) / HZ); ++ ++ show_implementation_id(m, nfss); ++@@ -977,6 +1010,7 @@ static void nfs_free_parsed_mount_data(struct nfs_parsed_mount_data *data) ++ kfree(data->nfs_server.export_path); ++ kfree(data->nfs_server.hostname); ++ kfree(data->fscache_uniq); +++ enfs_free_mount_options(data); ++ security_free_mnt_opts(&data->lsm_opts); ++ kfree(data); ++ } ++@@ -1641,7 +1675,34 @@ static int nfs_parse_mount_options(char *raw, ++ return 0; ++ }; ++ break; ++- +++#if IS_ENABLED(CONFIG_ENFS) +++ case Opt_remote_iplist: +++ case Opt_local_iplist: +++ case Opt_remote_dnslist: +++ string = match_strdup(args); +++ if (string == NULL) +++ goto out_nomem; +++ rc = enfs_parse_mount_options(getNfsMultiPathOpt(token), +++ string, mnt); +++ kfree(string); +++ switch (rc) { +++ case 0: +++ break; +++ case -ENOMEM: +++ goto out_nomem; +++ case -ENOSPC: +++ goto out_limit; +++ case -EINVAL: +++ goto out_invalid_address; +++ case -ENOTSUPP: +++ goto out_invalid_address; +++ case -EOPNOTSUPP: +++ goto out_invalid_address; +++ } +++ break; +++ case Opt_enfs_info: +++ break; +++#endif ++ /* ++ * Special options ++ */ ++@@ -1720,6 +1781,11 @@ static int nfs_parse_mount_options(char *raw, ++ free_secdata(secdata); ++ printk(KERN_INFO "NFS: security options invalid: %d\n", rc); ++ return 0; +++#if IS_ENABLED(CONFIG_ENFS) +++out_limit: +++ dprintk("NFS: param is more than supported limit: %d\n", rc); +++ return 0; +++#endif ++ } ++ ++ /* ++@@ -2335,6 +2401,14 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) ++ if (!nfs_parse_mount_options((char *)options, data)) ++ goto out; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ if (data->enfs_option) { +++ error = nfs_remount_iplist(nfss->nfs_client, data->enfs_option); +++ if (error) +++ goto out; +++ } +++#endif +++ ++ /* ++ * noac is a special case. It implies -o sync, but that's not ++ * necessarily reflected in the mtab options. do_remount_sb ++@@ -2347,6 +2421,11 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) ++ /* compare new mount options with old ones */ ++ error = nfs_compare_remount_data(nfss, data); ++ out: +++#if IS_ENABLED(CONFIG_ENFS) +++ /* release remount option member */ +++ if (data->enfs_option) +++ enfs_free_mount_options(data); +++#endif ++ nfs_free_parsed_mount_data(data); ++ return error; ++ } ++diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h ++index 7023ae64e3d7..2c19678afe8d 100644 ++--- a/include/linux/nfs_fs_sb.h +++++ b/include/linux/nfs_fs_sb.h ++@@ -123,6 +123,11 @@ struct nfs_client { ++ ++ struct net *cl_net; ++ struct list_head pending_cb_stateids; +++ +++#if IS_ENABLED(CONFIG_ENFS) +++ /* multi path private structure (struct multipath_client_info *) */ +++ void *cl_multipath_data; +++#endif ++ }; ++ ++ /* +diff --git a/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch b/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch +new file mode 100644 +index 0000000..540a2ce +--- /dev/null ++++ b/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch +@@ -0,0 +1,805 @@ ++diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h ++index 8aa865bce4f6..89178f78de8c 100644 ++--- a/include/linux/sunrpc/clnt.h +++++ b/include/linux/sunrpc/clnt.h ++@@ -70,6 +70,10 @@ struct rpc_clnt { ++ struct dentry *cl_debugfs; /* debugfs directory */ ++ #endif ++ struct rpc_xprt_iter cl_xpi; +++ +++#if IS_ENABLED(CONFIG_ENFS) +++ bool cl_enfs; +++#endif ++ }; ++ ++ /* ++@@ -124,6 +128,9 @@ struct rpc_create_args { ++ unsigned long flags; ++ char *client_name; ++ struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */ +++#if IS_ENABLED(CONFIG_ENFS) +++ void *multipath_option; +++#endif ++ }; ++ ++ struct rpc_add_xprt_test { ++@@ -221,6 +228,12 @@ bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt, ++ const struct sockaddr *sap); ++ void rpc_cleanup_clids(void); ++ +++#if IS_ENABLED(CONFIG_ENFS) +++int +++rpc_clnt_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt, +++ const struct rpc_call_ops *ops, void *data, int flags); +++#endif /* CONFIG_ENFS */ +++ ++ static inline int rpc_reply_expected(struct rpc_task *task) ++ { ++ return (task->tk_msg.rpc_proc != NULL) && ++diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h ++index ad2e243f3f03..124f5a0faf3e 100644 ++--- a/include/linux/sunrpc/sched.h +++++ b/include/linux/sunrpc/sched.h ++@@ -90,6 +90,9 @@ struct rpc_task { ++ tk_garb_retry : 2, ++ tk_cred_retry : 2, ++ tk_rebind_retry : 2; +++#if IS_ENABLED(CONFIG_ENFS) +++ unsigned long tk_major_timeo; /* major timeout ticks */ +++#endif ++ }; ++ ++ typedef void (*rpc_action)(struct rpc_task *); ++@@ -118,6 +121,9 @@ struct rpc_task_setup { ++ */ ++ #define RPC_TASK_ASYNC 0x0001 /* is an async task */ ++ #define RPC_TASK_SWAPPER 0x0002 /* is swapping in/out */ +++#if IS_ENABLED(CONFIG_ENFS) +++#define RPC_TASK_FIXED 0x0004 /* detect xprt status task */ +++#endif ++ #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ ++ #define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ ++ #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ ++@@ -257,6 +263,9 @@ void rpc_destroy_mempool(void); ++ extern struct workqueue_struct *rpciod_workqueue; ++ extern struct workqueue_struct *xprtiod_workqueue; ++ void rpc_prepare_task(struct rpc_task *task); +++#if IS_ENABLED(CONFIG_ENFS) +++void rpc_init_task_retry_counters(struct rpc_task *task); +++#endif ++ ++ static inline int rpc_wait_for_completion_task(struct rpc_task *task) ++ { ++diff --git a/include/linux/sunrpc/sunrpc_enfs_adapter.h b/include/linux/sunrpc/sunrpc_enfs_adapter.h ++new file mode 100644 ++index 000000000000..28abedcf5cf6 ++--- /dev/null +++++ b/include/linux/sunrpc/sunrpc_enfs_adapter.h ++@@ -0,0 +1,128 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* Client-side SUNRPC ENFS adapter header. +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _SUNRPC_ENFS_ADAPTER_H_ +++#define _SUNRPC_ENFS_ADAPTER_H_ +++#include <linux/sunrpc/clnt.h> +++ +++#if IS_ENABLED(CONFIG_ENFS) +++ +++static inline void rpc_xps_nactive_add_one(struct rpc_xprt_switch *xps) +++{ +++ xps->xps_nactive--; +++} +++ +++static inline void rpc_xps_nactive_sub_one(struct rpc_xprt_switch *xps) +++{ +++ xps->xps_nactive--; +++} +++ +++struct rpc_xprt *rpc_task_get_xprt +++(struct rpc_clnt *clnt, struct rpc_xprt *xprt); +++ +++struct rpc_multipath_ops { +++ struct module *owner; +++ void (*create_clnt)(struct rpc_create_args *args, +++ struct rpc_clnt *clnt); +++ void (*releas_clnt)(struct rpc_clnt *clnt); +++ void (*create_xprt)(struct rpc_xprt *xprt); +++ void (*destroy_xprt)(struct rpc_xprt *xprt); +++ void (*xprt_iostat)(struct rpc_task *task); +++ void (*failover_handle)(struct rpc_task *task); +++ bool (*task_need_call_start_again)(struct rpc_task *task); +++ void (*adjust_task_timeout)(struct rpc_task *task, void *condition); +++ void (*init_task_req)(struct rpc_task *task, struct rpc_rqst *req); +++ bool (*prepare_transmit)(struct rpc_task *task); +++}; +++ +++extern struct rpc_multipath_ops __rcu *multipath_ops; +++void rpc_init_task_retry_counters(struct rpc_task *task); +++int rpc_multipath_ops_register(struct rpc_multipath_ops *ops); +++int rpc_multipath_ops_unregister(struct rpc_multipath_ops *ops); +++struct rpc_multipath_ops *rpc_multipath_ops_get(void); +++void rpc_multipath_ops_put(struct rpc_multipath_ops *ops); +++void rpc_task_release_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt); +++void rpc_multipath_ops_create_clnt(struct rpc_create_args *args, +++ struct rpc_clnt *clnt); +++void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt); +++bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt); +++void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt); +++void rpc_multipath_ops_xprt_iostat(struct rpc_task *task); +++void rpc_multipath_ops_failover_handle(struct rpc_task *task); +++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task); +++void rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, +++ void *condition); +++void rpc_multipath_ops_init_task_req(struct rpc_task *task, +++ struct rpc_rqst *req); +++bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task); +++ +++#else +++static inline struct rpc_xprt *rpc_task_get_xprt(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt) +++{ +++ return NULL; +++} +++ +++static inline void rpc_task_release_xprt(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt) +++{ +++} +++ +++static inline void rpc_xps_nactive_add_one(struct rpc_xprt_switch *xps) +++{ +++} +++ +++static inline void rpc_xps_nactive_sub_one(struct rpc_xprt_switch *xps) +++{ +++} +++ +++static inline void rpc_multipath_ops_create_clnt +++(struct rpc_create_args *args, struct rpc_clnt *clnt) +++{ +++} +++ +++static inline void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt) +++{ +++} +++ +++static inline bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt) +++{ +++ return false; +++} +++ +++static inline void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt) +++{ +++} +++ +++static inline void rpc_multipath_ops_xprt_iostat(struct rpc_task *task) +++{ +++} +++ +++static inline void rpc_multipath_ops_failover_handle(struct rpc_task *task) +++{ +++} +++ +++static inline +++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task) +++{ +++ return false; +++} +++ +++static inline void +++rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, void *condition) +++{ +++} +++ +++static inline void +++rpc_multipath_ops_init_task_req(struct rpc_task *task, struct rpc_rqst *req) +++{ +++} +++ +++static inline bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task) +++{ +++ return false; +++} +++ +++#endif +++#endif // _SUNRPC_ENFS_ADAPTER_H_ ++diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h ++index ccfacca1eba9..2e47b3577947 100644 ++--- a/include/linux/sunrpc/xprt.h +++++ b/include/linux/sunrpc/xprt.h ++@@ -279,6 +279,10 @@ struct rpc_xprt { ++ atomic_t inject_disconnect; ++ #endif ++ struct rcu_head rcu; +++#if IS_ENABLED(CONFIG_ENFS) +++ atomic_long_t queuelen; +++ void *multipath_context; +++#endif ++ }; ++ ++ #if defined(CONFIG_SUNRPC_BACKCHANNEL) ++diff --git a/include/linux/sunrpc/xprtmultipath.h b/include/linux/sunrpc/xprtmultipath.h ++index af1257c030d2..d54e4dbbbf34 100644 ++--- a/include/linux/sunrpc/xprtmultipath.h +++++ b/include/linux/sunrpc/xprtmultipath.h ++@@ -22,6 +22,10 @@ struct rpc_xprt_switch { ++ const struct rpc_xprt_iter_ops *xps_iter_ops; ++ ++ struct rcu_head xps_rcu; +++#if IS_ENABLED(CONFIG_ENFS) +++ unsigned int xps_nactive; +++ atomic_long_t xps_queuelen; +++#endif ++ }; ++ ++ struct rpc_xprt_iter { ++@@ -69,4 +73,8 @@ extern struct rpc_xprt *xprt_iter_get_next(struct rpc_xprt_iter *xpi); ++ ++ extern bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps, ++ const struct sockaddr *sap); +++#if IS_ENABLED(CONFIG_ENFS) +++extern void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, +++ struct rpc_xprt *xprt); +++#endif ++ #endif ++diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c ++index 0fc540b0d183..d7ffee637148 100644 ++--- a/net/sunrpc/clnt.c +++++ b/net/sunrpc/clnt.c ++@@ -37,6 +37,7 @@ ++ #include <linux/sunrpc/rpc_pipe_fs.h> ++ #include <linux/sunrpc/metrics.h> ++ #include <linux/sunrpc/bc_xprt.h> +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ #include <trace/events/sunrpc.h> ++ ++ #include "sunrpc.h" ++@@ -490,6 +491,8 @@ static struct rpc_clnt *rpc_create_xprt(struct rpc_create_args *args, ++ } ++ } ++ +++ rpc_multipath_ops_create_clnt(args, clnt); +++ ++ clnt->cl_softrtry = 1; ++ if (args->flags & RPC_CLNT_CREATE_HARDRTRY) ++ clnt->cl_softrtry = 0; ++@@ -869,6 +872,8 @@ void rpc_shutdown_client(struct rpc_clnt *clnt) ++ list_empty(&clnt->cl_tasks), 1*HZ); ++ } ++ +++ rpc_multipath_ops_releas_clnt(clnt); +++ ++ rpc_release_client(clnt); ++ } ++ EXPORT_SYMBOL_GPL(rpc_shutdown_client); ++@@ -981,7 +986,13 @@ void rpc_task_release_transport(struct rpc_task *task) ++ ++ if (xprt) { ++ task->tk_xprt = NULL; ++- xprt_put(xprt); +++#if IS_ENABLED(CONFIG_ENFS) +++ if (task->tk_client) { +++ rpc_task_release_xprt(task->tk_client, xprt); +++ return; +++ } +++#endif +++ xprt_put(xprt); ++ } ++ } ++ EXPORT_SYMBOL_GPL(rpc_task_release_transport); ++@@ -990,6 +1001,10 @@ void rpc_task_release_client(struct rpc_task *task) ++ { ++ struct rpc_clnt *clnt = task->tk_client; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ rpc_task_release_transport(task); +++#endif +++ ++ if (clnt != NULL) { ++ /* Remove from client task list */ ++ spin_lock(&clnt->cl_lock); ++@@ -999,14 +1014,29 @@ void rpc_task_release_client(struct rpc_task *task) ++ ++ rpc_release_client(clnt); ++ } +++#if IS_ENABLED(CONFIG_ENFS) +++#else ++ rpc_task_release_transport(task); +++#endif ++ } ++ +++#if IS_ENABLED(CONFIG_ENFS) +++static struct rpc_xprt * +++rpc_task_get_next_xprt(struct rpc_clnt *clnt) +++{ +++ return rpc_task_get_xprt(clnt, xprt_iter_get_next(&clnt->cl_xpi)); +++} +++#endif +++ ++ static ++ void rpc_task_set_transport(struct rpc_task *task, struct rpc_clnt *clnt) ++ { ++ if (!task->tk_xprt) +++#if IS_ENABLED(CONFIG_ENFS) +++ task->tk_xprt = rpc_task_get_next_xprt(clnt); +++#else ++ task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi); +++#endif ++ } ++ ++ static ++@@ -1597,6 +1627,14 @@ call_reserveresult(struct rpc_task *task) ++ return; ++ case -EIO: /* probably a shutdown */ ++ break; +++#if IS_ENABLED(CONFIG_ENFS) +++ case -ETIMEDOUT: /* woken up; restart */ +++ if (rpc_multipath_ops_task_need_call_start_again(task)) { +++ rpc_task_release_transport(task); +++ task->tk_action = call_start; +++ return; +++ } +++#endif ++ default: ++ printk(KERN_ERR "%s: unrecognized error %d, exiting\n", ++ __func__, status); ++@@ -1962,6 +2000,10 @@ call_transmit(struct rpc_task *task) ++ return; ++ if (!xprt_prepare_transmit(task)) ++ return; +++ +++ if (rpc_multipath_ops_prepare_transmit(task)) +++ return; +++ ++ task->tk_action = call_transmit_status; ++ /* Encode here so that rpcsec_gss can use correct sequence number. */ ++ if (rpc_task_need_encode(task)) { ++@@ -2277,6 +2319,9 @@ call_timeout(struct rpc_task *task) ++ ++ retry: ++ task->tk_action = call_bind; +++#if IS_ENABLED(CONFIG_ENFS) +++ rpc_multipath_ops_failover_handle(task); +++#endif ++ task->tk_status = 0; ++ } ++ ++@@ -2961,3 +3006,30 @@ rpc_clnt_swap_deactivate(struct rpc_clnt *clnt) ++ } ++ EXPORT_SYMBOL_GPL(rpc_clnt_swap_deactivate); ++ #endif /* CONFIG_SUNRPC_SWAP */ +++ +++#if IS_ENABLED(CONFIG_ENFS) +++/* rpc_clnt_test_xprt - Test and add a new transport to a rpc_clnt +++ * @clnt: pointer to struct rpc_clnt +++ * @xprt: pointer struct rpc_xprt +++ * @ops: async operation +++ */ +++int +++rpc_clnt_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt, +++ const struct rpc_call_ops *ops, void *data, int flags) +++{ +++ struct rpc_cred *cred; +++ struct rpc_task *task; +++ +++ cred = authnull_ops.lookup_cred(NULL, NULL, 0); +++ task = rpc_call_null_helper(clnt, xprt, cred, +++ RPC_TASK_SOFT | RPC_TASK_SOFTCONN | flags, +++ ops, data); +++ put_rpccred(cred); +++ if (IS_ERR(task)) +++ return PTR_ERR(task); +++ +++ rpc_put_task(task); +++ return 1; +++} +++EXPORT_SYMBOL_GPL(rpc_clnt_test_xprt); +++#endif ++diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c ++index a873c92a4898..2254fea0e863 100644 ++--- a/net/sunrpc/sched.c +++++ b/net/sunrpc/sched.c ++@@ -20,7 +20,7 @@ ++ #include <linux/mutex.h> ++ #include <linux/freezer.h> ++ ++-#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ ++ #include "sunrpc.h" ++ ++@@ -962,7 +962,12 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta ++ /* Initialize workqueue for async tasks */ ++ task->tk_workqueue = task_setup_data->workqueue; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ task->tk_xprt = rpc_task_get_xprt(task_setup_data->rpc_client, +++ xprt_get(task_setup_data->rpc_xprt)); +++#else ++ task->tk_xprt = xprt_get(task_setup_data->rpc_xprt); +++#endif ++ ++ if (task->tk_ops->rpc_call_prepare != NULL) ++ task->tk_action = rpc_prepare_task; ++diff --git a/net/sunrpc/sunrpc_enfs_adapter.c b/net/sunrpc/sunrpc_enfs_adapter.c ++new file mode 100644 ++index 000000000000..c1543545c6de ++--- /dev/null +++++ b/net/sunrpc/sunrpc_enfs_adapter.c ++@@ -0,0 +1,214 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* Client-side SUNRPC ENFS adapter header. +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> +++ +++struct rpc_multipath_ops __rcu *multipath_ops; +++ +++void rpc_init_task_retry_counters(struct rpc_task *task) +++{ +++ /* Initialize retry counters */ +++ task->tk_garb_retry = 2; +++ task->tk_cred_retry = 2; +++ task->tk_rebind_retry = 2; +++} +++EXPORT_SYMBOL_GPL(rpc_init_task_retry_counters); +++ +++struct rpc_xprt * +++rpc_task_get_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) +++{ +++ struct rpc_xprt_switch *xps; +++ +++ if (!xprt) +++ return NULL; +++ rcu_read_lock(); +++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); +++ atomic_long_inc(&xps->xps_queuelen); +++ rcu_read_unlock(); +++ atomic_long_inc(&xprt->queuelen); +++ +++ return xprt; +++} +++ +++int rpc_multipath_ops_register(struct rpc_multipath_ops *ops) +++{ +++ struct rpc_multipath_ops *old; +++ +++ old = cmpxchg((struct rpc_multipath_ops **)&multipath_ops, NULL, ops); +++ if (!old || old == ops) +++ return 0; +++ pr_err("regist rpc_multipath ops %p fail. old %p\n", ops, old); +++ return -EPERM; +++} +++EXPORT_SYMBOL_GPL(rpc_multipath_ops_register); +++ +++int rpc_multipath_ops_unregister(struct rpc_multipath_ops *ops) +++{ +++ struct rpc_multipath_ops *old; +++ +++ old = cmpxchg((struct rpc_multipath_ops **)&multipath_ops, ops, NULL); +++ if (!old || old == ops) +++ return 0; +++ pr_err("regist rpc_multipath ops %p fail. old %p\n", ops, old); +++ return -EPERM; +++} +++EXPORT_SYMBOL_GPL(rpc_multipath_ops_unregister); +++ +++struct rpc_multipath_ops *rpc_multipath_ops_get(void) +++{ +++ struct rpc_multipath_ops *ops; +++ +++ rcu_read_lock(); +++ ops = rcu_dereference(multipath_ops); +++ if (!ops) { +++ rcu_read_unlock(); +++ return NULL; +++ } +++ if (!try_module_get(ops->owner)) +++ ops = NULL; +++ rcu_read_unlock(); +++ return ops; +++} +++EXPORT_SYMBOL_GPL(rpc_multipath_ops_get); +++ +++void rpc_multipath_ops_put(struct rpc_multipath_ops *ops) +++{ +++ if (ops) +++ module_put(ops->owner); +++} +++EXPORT_SYMBOL_GPL(rpc_multipath_ops_put); +++ +++void rpc_task_release_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) +++{ +++ struct rpc_xprt_switch *xps; +++ +++ atomic_long_dec(&xprt->queuelen); +++ rcu_read_lock(); +++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); +++ atomic_long_dec(&xps->xps_queuelen); +++ rcu_read_unlock(); +++ +++ xprt_put(xprt); +++} +++ +++void rpc_multipath_ops_create_clnt(struct rpc_create_args *args, +++ struct rpc_clnt *clnt) +++{ +++ struct rpc_multipath_ops *mops; +++ +++ if (args->multipath_option) { +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->create_clnt) +++ mops->create_clnt(args, clnt); +++ rpc_multipath_ops_put(mops); +++ } +++} +++ +++void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt) +++{ +++ struct rpc_multipath_ops *mops; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->releas_clnt) +++ mops->releas_clnt(clnt); +++ +++ rpc_multipath_ops_put(mops); +++} +++ +++bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt) +++{ +++ struct rpc_multipath_ops *mops = NULL; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->create_xprt) { +++ mops->create_xprt(xprt); +++ if (!xprt->multipath_context) { +++ rpc_multipath_ops_put(mops); +++ return true; +++ } +++ } +++ rpc_multipath_ops_put(mops); +++ return false; +++} +++ +++void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt) +++{ +++ struct rpc_multipath_ops *mops; +++ +++ if (xprt->multipath_context) { +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->destroy_xprt) +++ mops->destroy_xprt(xprt); +++ rpc_multipath_ops_put(mops); +++ } +++} +++ +++void rpc_multipath_ops_xprt_iostat(struct rpc_task *task) +++{ +++ struct rpc_multipath_ops *mops; +++ +++ mops = rpc_multipath_ops_get(); +++ if (task->tk_client && mops && mops->xprt_iostat) +++ mops->xprt_iostat(task); +++ rpc_multipath_ops_put(mops); +++} +++ +++void rpc_multipath_ops_failover_handle(struct rpc_task *task) +++{ +++ struct rpc_multipath_ops *mpath_ops = NULL; +++ +++ mpath_ops = rpc_multipath_ops_get(); +++ if (mpath_ops && mpath_ops->failover_handle) +++ mpath_ops->failover_handle(task); +++ rpc_multipath_ops_put(mpath_ops); +++} +++ +++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task) +++{ +++ struct rpc_multipath_ops *mpath_ops = NULL; +++ bool ret = false; +++ +++ mpath_ops = rpc_multipath_ops_get(); +++ if (mpath_ops && mpath_ops->task_need_call_start_again) +++ ret = mpath_ops->task_need_call_start_again(task); +++ rpc_multipath_ops_put(mpath_ops); +++ return ret; +++} +++ +++void rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, +++ void *condition) +++{ +++ struct rpc_multipath_ops *mops = NULL; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->adjust_task_timeout) +++ mops->adjust_task_timeout(task, NULL); +++ rpc_multipath_ops_put(mops); +++} +++ +++void rpc_multipath_ops_init_task_req(struct rpc_task *task, +++ struct rpc_rqst *req) +++{ +++ struct rpc_multipath_ops *mops = NULL; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->init_task_req) +++ mops->init_task_req(task, req); +++ rpc_multipath_ops_put(mops); +++} +++ +++bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task) +++{ +++ struct rpc_multipath_ops *mops = NULL; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->prepare_transmit) { +++ if (!(mops->prepare_transmit(task))) { +++ rpc_multipath_ops_put(mops); +++ return true; +++ } +++ } +++ rpc_multipath_ops_put(mops); +++ return false; +++} ++diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c ++index c912bf20faa2..c2b63b3d5217 100644 ++--- a/net/sunrpc/xprt.c +++++ b/net/sunrpc/xprt.c ++@@ -48,6 +48,7 @@ ++ #include <linux/sunrpc/clnt.h> ++ #include <linux/sunrpc/metrics.h> ++ #include <linux/sunrpc/bc_xprt.h> +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ #include <linux/rcupdate.h> ++ ++ #include <trace/events/sunrpc.h> ++@@ -259,6 +260,9 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) ++ dprintk("RPC: %5u failed to lock transport %p\n", ++ task->tk_pid, xprt); ++ task->tk_timeout = 0; +++ +++ rpc_multipath_ops_adjust_task_timeout(task, NULL); +++ ++ task->tk_status = -EAGAIN; ++ if (req == NULL) ++ priority = RPC_PRIORITY_LOW; ++@@ -560,6 +564,9 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action) ++ struct rpc_xprt *xprt = req->rq_xprt; ++ ++ task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; +++ +++ rpc_multipath_ops_adjust_task_timeout(task, NULL); +++ ++ rpc_sleep_on(&xprt->pending, task, action); ++ } ++ EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); ++@@ -1347,6 +1354,9 @@ xprt_request_init(struct rpc_task *task) ++ req->rq_rcv_buf.buflen = 0; ++ req->rq_release_snd_buf = NULL; ++ xprt_reset_majortimeo(req); +++ +++ rpc_multipath_ops_init_task_req(task, req); +++ ++ dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid, ++ req, ntohl(req->rq_xid)); ++ } ++@@ -1427,6 +1437,9 @@ void xprt_release(struct rpc_task *task) ++ task->tk_ops->rpc_count_stats(task, task->tk_calldata); ++ else if (task->tk_client) ++ rpc_count_iostats(task, task->tk_client->cl_metrics); +++ +++ rpc_multipath_ops_xprt_iostat(task); +++ ++ spin_lock(&xprt->recv_lock); ++ if (!list_empty(&req->rq_list)) { ++ list_del_init(&req->rq_list); ++@@ -1455,6 +1468,7 @@ void xprt_release(struct rpc_task *task) ++ else ++ xprt_free_bc_request(req); ++ } +++EXPORT_SYMBOL_GPL(xprt_release); ++ ++ static void xprt_init(struct rpc_xprt *xprt, struct net *net) ++ { ++@@ -1528,6 +1542,10 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args) ++ return ERR_PTR(-ENOMEM); ++ } ++ +++if (rpc_multipath_ops_create_xprt(xprt)) { +++ xprt_destroy(xprt); +++ return ERR_PTR(-ENOMEM); +++} ++ rpc_xprt_debugfs_register(xprt); ++ ++ dprintk("RPC: created transport %p with %u slots\n", xprt, ++@@ -1547,6 +1565,9 @@ static void xprt_destroy_cb(struct work_struct *work) ++ rpc_destroy_wait_queue(&xprt->sending); ++ rpc_destroy_wait_queue(&xprt->backlog); ++ kfree(xprt->servername); +++ +++ rpc_multipath_ops_destroy_xprt(xprt); +++ ++ /* ++ * Tear down transport state and free the rpc_xprt ++ */ ++diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c ++index 6ebaa58b4eff..6202a0be1327 100644 ++--- a/net/sunrpc/xprtmultipath.c +++++ b/net/sunrpc/xprtmultipath.c ++@@ -18,6 +18,7 @@ ++ #include <linux/sunrpc/xprt.h> ++ #include <linux/sunrpc/addr.h> ++ #include <linux/sunrpc/xprtmultipath.h> +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ ++ typedef struct rpc_xprt *(*xprt_switch_find_xprt_t)(struct list_head *head, ++ const struct rpc_xprt *cur); ++@@ -26,8 +27,8 @@ static const struct rpc_xprt_iter_ops rpc_xprt_iter_singular; ++ static const struct rpc_xprt_iter_ops rpc_xprt_iter_roundrobin; ++ static const struct rpc_xprt_iter_ops rpc_xprt_iter_listall; ++ ++-static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, ++- struct rpc_xprt *xprt) +++void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, +++ struct rpc_xprt *xprt) ++ { ++ if (unlikely(xprt_get(xprt) == NULL)) ++ return; ++@@ -36,7 +37,9 @@ static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, ++ if (xps->xps_nxprts == 0) ++ xps->xps_net = xprt->xprt_net; ++ xps->xps_nxprts++; +++ rpc_xps_nactive_add_one(xps); ++ } +++EXPORT_SYMBOL(xprt_switch_add_xprt_locked); ++ ++ /** ++ * rpc_xprt_switch_add_xprt - Add a new rpc_xprt to an rpc_xprt_switch ++@@ -63,6 +66,7 @@ static void xprt_switch_remove_xprt_locked(struct rpc_xprt_switch *xps, ++ if (unlikely(xprt == NULL)) ++ return; ++ xps->xps_nxprts--; +++ rpc_xps_nactive_sub_one(xps); ++ if (xps->xps_nxprts == 0) ++ xps->xps_net = NULL; ++ smp_wmb(); ++@@ -84,7 +88,7 @@ void rpc_xprt_switch_remove_xprt(struct rpc_xprt_switch *xps, ++ spin_unlock(&xps->xps_lock); ++ xprt_put(xprt); ++ } ++- +++EXPORT_SYMBOL(rpc_xprt_switch_remove_xprt); ++ /** ++ * xprt_switch_alloc - Allocate a new struct rpc_xprt_switch ++ * @xprt: pointer to struct rpc_xprt ++@@ -102,7 +106,13 @@ struct rpc_xprt_switch *xprt_switch_alloc(struct rpc_xprt *xprt, ++ if (xps != NULL) { ++ spin_lock_init(&xps->xps_lock); ++ kref_init(&xps->xps_kref); +++#if IS_ENABLED(CONFIG_ENFS) +++ xps->xps_nxprts = 0; +++ xps->xps_nactive = 0; +++ atomic_long_set(&xps->xps_queuelen, 0); +++#else ++ xps->xps_nxprts = 0; +++#endif ++ INIT_LIST_HEAD(&xps->xps_xprt_list); ++ xps->xps_iter_ops = &rpc_xprt_iter_singular; ++ xprt_switch_add_xprt_locked(xps, xprt); ++@@ -148,6 +158,7 @@ struct rpc_xprt_switch *xprt_switch_get(struct rpc_xprt_switch *xps) ++ return xps; ++ return NULL; ++ } +++EXPORT_SYMBOL(xprt_switch_get); ++ ++ /** ++ * xprt_switch_put - Release a reference to a rpc_xprt_switch ++@@ -160,6 +171,7 @@ void xprt_switch_put(struct rpc_xprt_switch *xps) ++ if (xps != NULL) ++ kref_put(&xps->xps_kref, xprt_switch_free); ++ } +++EXPORT_SYMBOL(xprt_switch_put); ++ ++ /** ++ * rpc_xprt_switch_set_roundrobin - Set a round-robin policy on rpc_xprt_switch +diff --git a/0003-add_enfs_module_for_nfs_mount_option.patch b/0003-add_enfs_module_for_nfs_mount_option.patch +new file mode 100644 +index 0000000..70753b5 +--- /dev/null ++++ b/0003-add_enfs_module_for_nfs_mount_option.patch +@@ -0,0 +1,1209 @@ ++diff --git a/fs/nfs/enfs/Makefile b/fs/nfs/enfs/Makefile ++new file mode 100644 ++index 000000000000..6e83eb23c668 ++--- /dev/null +++++ b/fs/nfs/enfs/Makefile ++@@ -0,0 +1,18 @@ +++obj-m += enfs.o +++ +++#EXTRA_CFLAGS += -I$(PWD)/.. +++ +++enfs-y := enfs_init.o +++enfs-y += enfs_config.o +++enfs-y += mgmt_init.o +++enfs-y += enfs_multipath_client.o +++enfs-y += enfs_multipath_parse.o +++enfs-y += failover_path.o +++enfs-y += failover_time.o +++enfs-y += enfs_roundrobin.o +++enfs-y += enfs_multipath.o +++enfs-y += enfs_path.o +++enfs-y += enfs_proc.o +++enfs-y += enfs_remount.o +++enfs-y += pm_ping.o +++enfs-y += pm_state.o ++diff --git a/fs/nfs/enfs/enfs.h b/fs/nfs/enfs/enfs.h ++new file mode 100644 ++index 000000000000..be3d95220088 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs.h ++@@ -0,0 +1,62 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS multipath adapt header. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++ +++#ifndef _ENFS_H_ +++#define _ENFS_H_ +++#include <linux/atomic.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs3.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include "../enfs_adapter.h" +++ +++#define IP_ADDRESS_LEN_MAX 64 +++#define MAX_IP_PAIR_PER_MOUNT 8 +++#define MAX_IP_INDEX (MAX_IP_PAIR_PER_MOUNT) +++#define MAX_SUPPORTED_LOCAL_IP_COUNT 8 +++#define MAX_SUPPORTED_REMOTE_IP_COUNT 32 +++#define MAX_DNS_NAME_LEN 512 +++#define MAX_DNS_SUPPORTED 2 +++#define EXTEND_CMD_MAX_BUF_LEN 65356 +++ +++ +++struct nfs_ip_list { +++ int count; +++ struct sockaddr_storage address[MAX_SUPPORTED_REMOTE_IP_COUNT]; +++ size_t addrlen[MAX_SUPPORTED_REMOTE_IP_COUNT]; +++}; +++ +++struct NFS_ROUTE_DNS_S { +++ char dnsname[MAX_DNS_NAME_LEN]; // valid only if dnsExist is true +++}; +++ +++struct NFS_ROUTE_DNS_INFO_S { +++ int dnsNameCount; // Count of DNS name in the list +++ // valid only if dnsExist is true +++ struct NFS_ROUTE_DNS_S routeRemoteDnsList[MAX_DNS_SUPPORTED]; +++}; +++ +++struct rpc_iostats; +++struct enfs_xprt_context { +++ struct sockaddr_storage srcaddr; +++ struct rpc_iostats *stats; +++ bool main; +++ atomic_t path_state; +++ atomic_t path_check_state; +++}; +++ +++static inline bool enfs_is_main_xprt(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx = xprt->multipath_context; +++ +++ if (!ctx) +++ return false; +++ return ctx->main; +++} +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_init.c b/fs/nfs/enfs/enfs_init.c ++new file mode 100644 ++index 000000000000..4b55608191a7 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_init.c ++@@ -0,0 +1,98 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/module.h> +++#include <linux/sunrpc/sched.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs3.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include "enfs.h" +++#include "enfs_multipath_parse.h" +++#include "enfs_multipath_client.h" +++#include "enfs_remount.h" +++#include "init.h" +++#include "enfs_log.h" +++#include "enfs_multipath.h" +++#include "mgmt_init.h" +++ +++struct enfs_adapter_ops enfs_adapter = { +++ .name = "enfs", +++ .owner = THIS_MODULE, +++ .parse_mount_options = nfs_multipath_parse_options, +++ .free_mount_options = nfs_multipath_free_options, +++ .client_info_init = nfs_multipath_client_info_init, +++ .client_info_free = nfs_multipath_client_info_free, +++ .client_info_match = nfs_multipath_client_info_match, +++ .client_info_show = nfs_multipath_client_info_show, +++ .remount_ip_list = enfs_remount_iplist, +++}; +++ +++int32_t enfs_init(void) +++{ +++ int err; +++ +++ err = enfs_multipath_init(); +++ if (err) { +++ enfs_log_error("init multipath failed.\n"); +++ goto out; +++ } +++ +++ err = mgmt_init(); +++ if (err != 0) { +++ enfs_log_error("init mgmt failed.\n"); +++ goto out_tp_exit; +++ } +++ +++ return 0; +++ +++out_tp_exit: +++ enfs_multipath_exit(); +++out: +++ return err; +++} +++ +++void enfs_fini(void) +++{ +++ mgmt_fini(); +++ +++ enfs_multipath_exit(); +++} +++ +++static int __init init_enfs(void) +++{ +++ int ret; +++ +++ ret = enfs_adapter_register(&enfs_adapter); +++ if (ret) { +++ pr_err("regist enfs_adapter fail. ret %d\n", ret); +++ return -1; +++ } +++ +++ ret = enfs_init(); +++ if (ret) { +++ enfs_adapter_unregister(&enfs_adapter); +++ return -1; +++ } +++ +++ return 0; +++} +++ +++static void __exit exit_enfs(void) +++{ +++ enfs_fini(); +++ enfs_adapter_unregister(&enfs_adapter); +++} +++ +++MODULE_LICENSE("GPL"); +++MODULE_AUTHOR("Huawei Tech. Co., Ltd."); +++MODULE_DESCRIPTION("Nfs client router"); +++MODULE_VERSION("1.0"); +++ +++module_init(init_enfs); +++module_exit(exit_enfs); ++diff --git a/fs/nfs/enfs/enfs_multipath_client.c b/fs/nfs/enfs/enfs_multipath_client.c ++new file mode 100644 ++index 000000000000..63c02898a42c ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_client.c ++@@ -0,0 +1,340 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/types.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include <linux/proc_fs.h> +++#include <linux/seq_file.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/addr.h> +++#include "enfs_multipath_client.h" +++#include "enfs_multipath_parse.h" +++ +++int +++nfs_multipath_client_mount_info_init(struct multipath_client_info *client_info, +++ const struct nfs_client_initdata *client_init_data) +++{ +++ struct multipath_mount_options *mount_options = +++ (struct multipath_mount_options *)client_init_data->enfs_option; +++ +++ if (mount_options->local_ip_list) { +++ client_info->local_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ +++ if (!client_info->local_ip_list) +++ return -ENOMEM; +++ +++ memcpy(client_info->local_ip_list, mount_options->local_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (mount_options->remote_ip_list) { +++ +++ client_info->remote_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ +++ if (!client_info->remote_ip_list) { +++ kfree(client_info->local_ip_list); +++ client_info->local_ip_list = NULL; +++ return -ENOMEM; +++ } +++ memcpy(client_info->remote_ip_list, +++ mount_options->remote_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (mount_options->pRemoteDnsInfo) { +++ client_info->pRemoteDnsInfo = +++ kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), GFP_KERNEL); +++ +++ if (!client_info->pRemoteDnsInfo) { +++ kfree(client_info->local_ip_list); +++ client_info->local_ip_list = NULL; +++ kfree(client_info->remote_ip_list); +++ client_info->remote_ip_list = NULL; +++ return -ENOMEM; +++ } +++ memcpy(client_info->pRemoteDnsInfo, +++ mount_options->pRemoteDnsInfo, +++ sizeof(struct NFS_ROUTE_DNS_INFO_S)); +++ } +++ return 0; +++} +++ +++void nfs_multipath_client_info_free_work(struct work_struct *work) +++{ +++ +++ struct multipath_client_info *clp_info; +++ +++ if (work == NULL) +++ return; +++ +++ clp_info = container_of(work, struct multipath_client_info, work); +++ +++ if (clp_info->local_ip_list != NULL) { +++ kfree(clp_info->local_ip_list); +++ clp_info->local_ip_list = NULL; +++ } +++ if (clp_info->remote_ip_list != NULL) { +++ kfree(clp_info->remote_ip_list); +++ clp_info->remote_ip_list = NULL; +++ } +++ kfree(clp_info); +++} +++ +++void nfs_multipath_client_info_free(void *data) +++{ +++ struct multipath_client_info *clp_info = +++ (struct multipath_client_info *)data; +++ +++ if (clp_info == NULL) +++ return; +++ pr_info("free client info %p.\n", clp_info); +++ INIT_WORK(&clp_info->work, nfs_multipath_client_info_free_work); +++ schedule_work(&clp_info->work); +++} +++ +++int nfs_multipath_client_info_init(void **data, +++ const struct nfs_client_initdata *cl_init) +++{ +++ int rc; +++ struct multipath_client_info *info; +++ struct multipath_client_info **enfs_info; +++ /* no multi path info, no need do multipath init */ +++ if (cl_init->enfs_option == NULL) +++ return 0; +++ enfs_info = (struct multipath_client_info **)data; +++ if (enfs_info == NULL) +++ return -EINVAL; +++ +++ if (*enfs_info == NULL) +++ *enfs_info = kzalloc(sizeof(struct multipath_client_info), +++ GFP_KERNEL); +++ +++ if (*enfs_info == NULL) +++ return -ENOMEM; +++ +++ info = (struct multipath_client_info *)*enfs_info; +++ pr_info("init client info %p.\n", info); +++ rc = nfs_multipath_client_mount_info_init(info, cl_init); +++ if (rc) { +++ nfs_multipath_client_info_free((void *)info); +++ return rc; +++ } +++ return rc; +++} +++ +++bool nfs_multipath_ip_list_info_match(const struct nfs_ip_list *ip_list_src, +++ const struct nfs_ip_list *ip_list_dst) +++{ +++ int i; +++ int j; +++ bool is_find; +++ /* if both are equal or NULL, then return true. */ +++ if (ip_list_src == ip_list_dst) +++ return true; +++ +++ if ((ip_list_src == NULL || ip_list_dst == NULL)) +++ return false; +++ +++ if (ip_list_src->count != ip_list_dst->count) +++ return false; +++ +++ for (i = 0; i < ip_list_src->count; i++) { +++ is_find = false; +++ for (j = 0; j < ip_list_src->count; j++) { +++ if (rpc_cmp_addr_port( +++ (const struct sockaddr *) +++ &ip_list_src->address[i], +++ (const struct sockaddr *) +++ &ip_list_dst->address[j]) +++ ) { +++ is_find = true; +++ break; +++ } +++ } +++ if (is_find == false) +++ return false; +++ } +++ return true; +++} +++ +++int +++nfs_multipath_dns_list_info_match( +++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoSrc, +++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoDst) +++{ +++ int i; +++ +++ /* if both are equal or NULL, then return true. */ +++ if (pRemoteDnsInfoSrc == pRemoteDnsInfoDst) +++ return true; +++ +++ if ((pRemoteDnsInfoSrc == NULL || pRemoteDnsInfoDst == NULL)) +++ return false; +++ +++ if (pRemoteDnsInfoSrc->dnsNameCount != pRemoteDnsInfoDst->dnsNameCount) +++ return false; +++ +++ for (i = 0; i < pRemoteDnsInfoSrc->dnsNameCount; i++) { +++ if (!strcmp(pRemoteDnsInfoSrc->routeRemoteDnsList[i].dnsname, +++ pRemoteDnsInfoDst->routeRemoteDnsList[i].dnsname)) +++ return false; +++ } +++ return true; +++} +++ +++int nfs_multipath_client_info_match(void *src, void *dst) +++{ +++ int ret = true; +++ +++ struct multipath_client_info *src_info; +++ struct multipath_mount_options *dst_info; +++ +++ src_info = (struct multipath_client_info *)src; +++ dst_info = (struct multipath_mount_options *)dst; +++ pr_info("try match client .\n"); +++ ret = nfs_multipath_ip_list_info_match(src_info->local_ip_list, +++ dst_info->local_ip_list); +++ if (ret == false) { +++ pr_err("local_ip not match.\n"); +++ return ret; +++ } +++ +++ ret = nfs_multipath_ip_list_info_match(src_info->remote_ip_list, +++ dst_info->remote_ip_list); +++ if (ret == false) { +++ pr_err("remote_ip not match.\n"); +++ return ret; +++ } +++ +++ ret = nfs_multipath_dns_list_info_match(src_info->pRemoteDnsInfo, +++ dst_info->pRemoteDnsInfo); +++ if (ret == false) { +++ pr_err("dns not match.\n"); +++ return ret; +++ } +++ pr_info("try match client ret %d.\n", ret); +++ return ret; +++} +++ +++void nfs_multipath_print_ip_info(struct seq_file *mount_option, +++ struct nfs_ip_list *ip_list, +++ const char *type) +++{ +++ char buf[IP_ADDRESS_LEN_MAX + 1]; +++ int len = 0; +++ int i = 0; +++ +++ seq_printf(mount_option, ",%s=", type); +++ for (i = 0; i < ip_list->count; i++) { +++ len = rpc_ntop((struct sockaddr *)&ip_list->address[i], +++ buf, IP_ADDRESS_LEN_MAX); +++ if (len > 0 && len < IP_ADDRESS_LEN_MAX) +++ buf[len] = '\0'; +++ +++ if (i == 0) +++ seq_printf(mount_option, "%s", buf); +++ else +++ seq_printf(mount_option, "~%s", buf); +++ dfprintk(MOUNT, +++ "NFS: show nfs mount option type:%s %s [%s]\n", +++ type, buf, __func__); +++ } +++} +++ +++void nfs_multipath_print_dns_info(struct seq_file *mount_option, +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo, +++ const char *type) +++{ +++ int i = 0; +++ +++ seq_printf(mount_option, ",%s=", type); +++ for (i = 0; i < pRemoteDnsInfo->dnsNameCount; i++) { +++ if (i == 0) +++ seq_printf(mount_option, +++ "[%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ else if (i == pRemoteDnsInfo->dnsNameCount - 1) +++ seq_printf(mount_option, ",%s]", +++ pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ else +++ seq_printf(mount_option, +++ ",%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ } +++} +++ +++ +++static void multipath_print_sockaddr(struct seq_file *seq, +++ struct sockaddr *addr) +++{ +++ switch (addr->sa_family) { +++ case AF_INET: { +++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; +++ +++ seq_printf(seq, "%pI4", &sin->sin_addr); +++ return; +++ } +++ case AF_INET6: { +++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; +++ +++ seq_printf(seq, "%pI6", &sin6->sin6_addr); +++ return; +++ } +++ default: +++ break; +++ } +++ pr_err("unsupport family:%d\n", addr->sa_family); +++} +++ +++static void multipath_print_enfs_info(struct seq_file *seq, +++ struct nfs_server *server) +++{ +++ struct sockaddr_storage peeraddr; +++ struct rpc_clnt *next = server->client; +++ +++ rpc_peeraddr(server->client, +++ (struct sockaddr *)&peeraddr, sizeof(peeraddr)); +++ seq_puts(seq, ",enfs_info="); +++ multipath_print_sockaddr(seq, (struct sockaddr *)&peeraddr); +++ +++ while (next->cl_parent) { +++ if (next == next->cl_parent) +++ break; +++ next = next->cl_parent; +++ } +++ seq_printf(seq, "_%u", next->cl_clid); +++} +++ +++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data) +++{ +++ struct nfs_server *server = data; +++ struct multipath_client_info *client_info = +++ server->nfs_client->cl_multipath_data; +++ +++ dfprintk(MOUNT, "NFS: show nfs mount option[%s]\n", __func__); +++ if ((client_info->remote_ip_list) && +++ (client_info->remote_ip_list->count > 0)) +++ nfs_multipath_print_ip_info(mount_option, +++ client_info->remote_ip_list, +++ "remoteaddrs"); +++ +++ if ((client_info->local_ip_list) && +++ (client_info->local_ip_list->count > 0)) +++ nfs_multipath_print_ip_info(mount_option, +++ client_info->local_ip_list, +++ "localaddrs"); +++ +++ if ((client_info->pRemoteDnsInfo) && +++ (client_info->pRemoteDnsInfo->dnsNameCount > 0)) +++ nfs_multipath_print_dns_info(mount_option, +++ client_info->pRemoteDnsInfo, +++ "remotednsname"); +++ +++ multipath_print_enfs_info(mount_option, server); +++} ++diff --git a/fs/nfs/enfs/enfs_multipath_client.h b/fs/nfs/enfs/enfs_multipath_client.h ++new file mode 100644 ++index 000000000000..208f7260690d ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_client.h ++@@ -0,0 +1,26 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _ENFS_MULTIPATH_CLIENT_H_ +++#define _ENFS_MULTIPATH_CLIENT_H_ +++ +++#include "enfs.h" +++ +++struct multipath_client_info { +++ struct work_struct work; +++ struct nfs_ip_list *remote_ip_list; +++ struct nfs_ip_list *local_ip_list; +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; +++ s64 client_id; +++}; +++ +++int nfs_multipath_client_info_init(void **data, +++ const struct nfs_client_initdata *cl_init); +++void nfs_multipath_client_info_free(void *data); +++int nfs_multipath_client_info_match(void *src, void *dst); +++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data); +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_multipath_parse.c b/fs/nfs/enfs/enfs_multipath_parse.c ++new file mode 100644 ++index 000000000000..9c4c6c1880b6 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_parse.c ++@@ -0,0 +1,601 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/types.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include <linux/parser.h> +++#include <linux/kern_levels.h> +++#include <linux/sunrpc/addr.h> +++#include "enfs_multipath_parse.h" +++#include "enfs_log.h" +++ +++#define NFSDBG_FACILITY NFSDBG_CLIENT +++ +++void nfs_multipath_parse_ip_ipv6_add(struct sockaddr_in6 *sin6, int add_num) +++{ +++ int i; +++ +++ pr_info("NFS: before %08x%08x%08x%08x add_num: %d[%s]\n", +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]), +++ add_num, __func__); +++ for (i = 0; i < add_num; i++) { +++ sin6->sin6_addr.in6_u.u6_addr32[3] = +++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]) + 1); +++ +++ if (sin6->sin6_addr.in6_u.u6_addr32[3] != 0) +++ continue; +++ +++ sin6->sin6_addr.in6_u.u6_addr32[2] = +++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]) + 1); +++ +++ if (sin6->sin6_addr.in6_u.u6_addr32[2] != 0) +++ continue; +++ +++ sin6->sin6_addr.in6_u.u6_addr32[1] = +++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]) + 1); +++ +++ if (sin6->sin6_addr.in6_u.u6_addr32[1] != 0) +++ continue; +++ +++ sin6->sin6_addr.in6_u.u6_addr32[0] = +++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]) + 1); +++ +++ if (sin6->sin6_addr.in6_u.u6_addr32[0] != 0) +++ continue; +++ } +++ +++ return; +++ +++} +++ +++static int nfs_multipath_parse_ip_range(struct net *net_ns, const char *cursor, +++ struct nfs_ip_list *ip_list, enum nfsmultipathoptions type) +++{ +++ struct sockaddr_storage addr; +++ struct sockaddr_storage tmp_addr; +++ int i; +++ size_t len; +++ int add_num = 1; +++ bool duplicate_flag = false; +++ bool is_complete = false; +++ struct sockaddr_in *sin4; +++ struct sockaddr_in6 *sin6; +++ +++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", +++ cursor, type, __func__); +++ len = rpc_pton(net_ns, cursor, strlen(cursor), +++ (struct sockaddr *)&addr, sizeof(addr)); +++ if (!len) +++ return -EINVAL; +++ +++ if (addr.ss_family != ip_list->address[ip_list->count - 1].ss_family) { +++ pr_info("NFS: %s parsing nfs mount option type: %d fail.\n", +++ __func__, type); +++ return -EINVAL; +++ } +++ +++ if (rpc_cmp_addr((const struct sockaddr *) +++ &ip_list->address[ip_list->count - 1], +++ (const struct sockaddr *)&addr)) { +++ +++ pr_info("range ip is same ip.\n"); +++ return 0; +++ +++ } +++ +++ while (true) { +++ +++ tmp_addr = ip_list->address[ip_list->count - 1]; +++ +++ switch (addr.ss_family) { +++ case AF_INET: +++ sin4 = (struct sockaddr_in *)&tmp_addr; +++ +++ sin4->sin_addr.s_addr = +++ htonl(ntohl(sin4->sin_addr.s_addr) + add_num); +++ +++ pr_info("NFS: mount option ip%08x type: %d ipcont %d [%s]\n", +++ ntohl(sin4->sin_addr.s_addr), +++ type, ip_list->count, __func__); +++ break; +++ case AF_INET6: +++ sin6 = (struct sockaddr_in6 *)&tmp_addr; +++ nfs_multipath_parse_ip_ipv6_add(sin6, add_num); +++ pr_info("NFS: mount option ip %08x%08x%08x%08x type: %d ipcont %d [%s]\n", +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]), +++ type, ip_list->count, __func__); +++ break; +++ // return -EOPNOTSUPP; +++ default: +++ return -EOPNOTSUPP; +++ } +++ +++ if (rpc_cmp_addr((const struct sockaddr *)&tmp_addr, +++ (const struct sockaddr *)&addr)) { +++ is_complete = true; +++ } +++ // delete duplicate ip, continuosly repeat, skip it +++ for (i = 0; i < ip_list->count; i++) { +++ duplicate_flag = false; +++ if (rpc_cmp_addr((const struct sockaddr *) +++ &ip_list->address[i], +++ (const struct sockaddr *)&tmp_addr)) { +++ add_num++; +++ duplicate_flag = true; +++ break; +++ } +++ } +++ +++ if (duplicate_flag == false) { +++ pr_info("this ip not duplicate;"); +++ add_num = 1; +++ // if not repeat but omit limit return false +++ if ((type == LOCALADDR && +++ ip_list->count >= MAX_SUPPORTED_LOCAL_IP_COUNT) || +++ (type == REMOTEADDR && +++ ip_list->count >= MAX_SUPPORTED_REMOTE_IP_COUNT)) { +++ +++ pr_info("[MULTIPATH:%s] iplist for type %d reached %d, more than supported limit %d\n", +++ __func__, type, ip_list->count, +++ type == LOCALADDR ? +++ MAX_SUPPORTED_LOCAL_IP_COUNT : +++ MAX_SUPPORTED_REMOTE_IP_COUNT); +++ ip_list->count = 0; +++ return -ENOSPC; +++ } +++ ip_list->address[ip_list->count] = tmp_addr; +++ +++ ip_list->addrlen[ip_list->count] = +++ ip_list->addrlen[ip_list->count - 1]; +++ +++ ip_list->count += 1; +++ } +++ if (is_complete == true) +++ break; +++ } +++ return 0; +++} +++ +++int nfs_multipath_parse_ip_list_inter(struct nfs_ip_list *ip_list, +++ struct net *net_ns, +++ char *cursor, enum nfsmultipathoptions type) +++{ +++ int i = 0; +++ struct sockaddr_storage addr; +++ struct sockaddr_storage swap; +++ int len; +++ +++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", +++ cursor, type, __func__); +++ +++ len = rpc_pton(net_ns, cursor, +++ strlen(cursor), +++ (struct sockaddr *)&addr, sizeof(addr)); +++ if (!len) +++ return -EINVAL; +++ +++ // check repeated ip +++ for (i = 0; i < ip_list->count; i++) { +++ if (rpc_cmp_addr((const struct sockaddr *) +++ &ip_list->address[i], +++ (const struct sockaddr *)&addr)) { +++ +++ pr_info("NFS: mount option '%s' type:%d index %d same as before index %d [%s]\n", +++ cursor, type, ip_list->count, i, __func__); +++ // prevent this ip is beginning +++ // if repeated take it to the end of list +++ swap = ip_list->address[i]; +++ +++ ip_list->address[i] = +++ ip_list->address[ip_list->count-1]; +++ +++ ip_list->address[ip_list->count-1] = swap; +++ return 0; +++ } +++ } +++ // if not repeated, check exceed limit +++ if ((type == LOCALADDR && +++ ip_list->count >= MAX_SUPPORTED_LOCAL_IP_COUNT) || +++ (type == REMOTEADDR && +++ ip_list->count >= MAX_SUPPORTED_REMOTE_IP_COUNT)) { +++ +++ pr_info("[MULTIPATH:%s] iplist for type %d reached %d, more than supported limit %d\n", +++ __func__, type, ip_list->count, +++ type == LOCALADDR ? +++ MAX_SUPPORTED_LOCAL_IP_COUNT : +++ MAX_SUPPORTED_REMOTE_IP_COUNT); +++ +++ ip_list->count = 0; +++ return -ENOSPC; +++ } +++ ip_list->address[ip_list->count] = addr; +++ ip_list->addrlen[ip_list->count] = len; +++ ip_list->count++; +++ +++ return 0; +++} +++ +++char *nfs_multipath_parse_ip_list_get_cursor(char **buf_to_parse, bool *single) +++{ +++ char *cursor = NULL; +++ const char *single_sep = strchr(*buf_to_parse, '~'); +++ const char *range_sep = strchr(*buf_to_parse, '-'); +++ +++ *single = true; +++ if (range_sep) { +++ if (range_sep > single_sep) { // A-B or A~B-C +++ if (single_sep == NULL) { // A-B +++ cursor = strsep(buf_to_parse, "-"); +++ if (cursor) +++ *single = false; +++ } else// A~B-C +++ cursor = strsep(buf_to_parse, "~"); +++ } else { // A-B~C +++ cursor = strsep(buf_to_parse, "-"); +++ if (cursor) +++ *single = false; +++ } +++ } else { // A~B~C +++ cursor = strsep(buf_to_parse, "~"); +++ } +++ return cursor; +++} +++ +++bool nfs_multipath_parse_param_check(enum nfsmultipathoptions type, +++ struct multipath_mount_options *options) +++{ +++ if (type == REMOUNTREMOTEADDR && options->remote_ip_list->count != 0) { +++ memset(options->remote_ip_list, 0, sizeof(struct nfs_ip_list)); +++ return true; +++ } +++ if (type == REMOUNTLOCALADDR && options->local_ip_list->count != 0) { +++ memset(options->local_ip_list, 0, sizeof(struct nfs_ip_list)); +++ return true; +++ } +++ if ((type == REMOTEADDR || type == REMOTEDNSNAME) && +++ options->pRemoteDnsInfo->dnsNameCount != 0) { +++ +++ pr_info("[MULTIPATH:%s] parse for %d ,already have dns\n", +++ __func__, type); +++ return false; +++ } else if ((type == REMOTEADDR || type == REMOTEDNSNAME) && +++ options->remote_ip_list->count != 0) { +++ +++ pr_info("[MULTIPATH:%s] parse for %d ,already have iplist\n", +++ __func__, type); +++ return false; +++ } +++ return true; +++} +++ +++int nfs_multipath_parse_ip_list(char *buffer, struct net *net_ns, +++ struct multipath_mount_options *options, +++ enum nfsmultipathoptions type) +++{ +++ char *buf_to_parse = NULL; +++ bool prev_range = false; +++ int ret = 0; +++ char *cursor = NULL; +++ bool single = true; +++ struct nfs_ip_list *ip_list_tmp = NULL; +++ +++ if (!nfs_multipath_parse_param_check(type, options)) +++ return -ENOTSUPP; +++ +++ if (type == REMOUNTREMOTEADDR) +++ type = REMOTEADDR; +++ +++ if (type == REMOUNTLOCALADDR) +++ type = LOCALADDR; +++ +++ if (type == LOCALADDR) +++ ip_list_tmp = options->local_ip_list; +++ else +++ ip_list_tmp = options->remote_ip_list; +++ +++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", +++ buffer, type, __func__); +++ +++ buf_to_parse = buffer; +++ while (buf_to_parse != NULL) { +++ cursor = +++ nfs_multipath_parse_ip_list_get_cursor(&buf_to_parse, &single); +++ if (!cursor) +++ break; +++ +++ if (single == false && prev_range == true) { +++ pr_info("NFS: mount option type: %d fail. Multiple Range.[%s]\n", +++ type, __func__); +++ +++ ret = -EINVAL; +++ goto out; +++ } +++ +++ if (prev_range == false) { +++ ret = nfs_multipath_parse_ip_list_inter(ip_list_tmp, +++ net_ns, cursor, type); +++ if (ret) +++ goto out; +++ if (single == false) +++ prev_range = true; +++ } else { +++ ret = nfs_multipath_parse_ip_range(net_ns, cursor, +++ ip_list_tmp, type); +++ if (ret != 0) +++ goto out; +++ prev_range = false; +++ } +++ } +++ +++out: +++ if (ret) +++ memset(ip_list_tmp, 0, sizeof(struct nfs_ip_list)); +++ +++ return ret; +++} +++ +++int nfs_multipath_parse_dns_list(char *buffer, struct net *net_ns, +++ struct multipath_mount_options *options) +++{ +++ struct NFS_ROUTE_DNS_INFO_S *dns_name_list_tmp = NULL; +++ char *cursor = NULL; +++ char *bufToParse; +++ +++ if (!nfs_multipath_parse_param_check(REMOTEDNSNAME, options)) +++ return -ENOTSUPP; +++ +++ pr_info("[MULTIPATH:%s] buffer %s\n", __func__, buffer); +++ // freed in nfs_free_parsed_mount_data +++ dns_name_list_tmp = kmalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), +++ GFP_KERNEL); +++ if (!dns_name_list_tmp) +++ return -ENOMEM; +++ +++ dns_name_list_tmp->dnsNameCount = 0; +++ bufToParse = buffer; +++ while (bufToParse) { +++ if (dns_name_list_tmp->dnsNameCount >= MAX_DNS_SUPPORTED) { +++ pr_err("%s: dnsname for %s reached %d,more than supported limit %d\n", +++ __func__, cursor, +++ dns_name_list_tmp->dnsNameCount, +++ MAX_DNS_SUPPORTED); +++ dns_name_list_tmp->dnsNameCount = 0; +++ return -ENOSPC; +++ } +++ cursor = strsep(&bufToParse, "~"); +++ if (!cursor) +++ break; +++ +++ strcpy(dns_name_list_tmp->routeRemoteDnsList +++ [dns_name_list_tmp->dnsNameCount].dnsname, +++ cursor); +++ dns_name_list_tmp->dnsNameCount++; +++ } +++ if (dns_name_list_tmp->dnsNameCount == 0) +++ return -EINVAL; +++ options->pRemoteDnsInfo = dns_name_list_tmp; +++ return 0; +++} +++ +++int nfs_multipath_parse_options_check_ipv4_valid(struct sockaddr_in *addr) +++{ +++ if (addr->sin_addr.s_addr == 0 || addr->sin_addr.s_addr == 0xffffffff) +++ return -EINVAL; +++ return 0; +++} +++ +++int nfs_multipath_parse_options_check_ipv6_valid(struct sockaddr_in6 *addr) +++{ +++ if (addr->sin6_addr.in6_u.u6_addr32[0] == 0 && +++ addr->sin6_addr.in6_u.u6_addr32[1] == 0 && +++ addr->sin6_addr.in6_u.u6_addr32[2] == 0 && +++ addr->sin6_addr.in6_u.u6_addr32[3] == 0) +++ return -EINVAL; +++ +++ if (addr->sin6_addr.in6_u.u6_addr32[0] == 0xffffffff && +++ addr->sin6_addr.in6_u.u6_addr32[1] == 0xffffffff && +++ addr->sin6_addr.in6_u.u6_addr32[2] == 0xffffffff && +++ addr->sin6_addr.in6_u.u6_addr32[3] == 0xffffffff) +++ return -EINVAL; +++ return 0; +++} +++ +++int nfs_multipath_parse_options_check_ip_valid(struct sockaddr_storage *address) +++{ +++ int rc = 0; +++ +++ if (address->ss_family == AF_INET) +++ rc = nfs_multipath_parse_options_check_ipv4_valid( +++ (struct sockaddr_in *)address); +++ else if (address->ss_family == AF_INET6) +++ rc = nfs_multipath_parse_options_check_ipv6_valid( +++ (struct sockaddr_in6 *)address); +++ else +++ rc = -EINVAL; +++ +++ return rc; +++} +++ +++int nfs_multipath_parse_options_check_valid( +++ struct multipath_mount_options *options) +++{ +++ int rc; +++ int i; +++ +++ if (options == NULL) +++ return 0; +++ +++ for (i = 0; i < options->local_ip_list->count; i++) { +++ rc = nfs_multipath_parse_options_check_ip_valid( +++ &options->local_ip_list->address[i]); +++ if (rc != 0) +++ return rc; +++ } +++ +++ for (i = 0; i < options->remote_ip_list->count; i++) { +++ rc = nfs_multipath_parse_options_check_ip_valid( +++ &options->remote_ip_list->address[i]); +++ if (rc != 0) +++ return rc; +++ } +++ +++ return 0; +++} +++int nfs_multipath_parse_options_check_duplicate( +++ struct multipath_mount_options *options) +++{ +++ int i; +++ int j; +++ +++ if (options == NULL || +++ options->local_ip_list->count == 0 || +++ options->remote_ip_list->count == 0) +++ +++ return 0; +++ +++ for (i = 0; i < options->local_ip_list->count; i++) { +++ for (j = 0; j < options->remote_ip_list->count; j++) { +++ if (rpc_cmp_addr((const struct sockaddr *) +++ &options->local_ip_list->address[i], +++ (const struct sockaddr *) +++ &options->remote_ip_list->address[j])) +++ return -ENOTSUPP; +++ } +++ } +++ return 0; +++} +++ +++int nfs_multipath_parse_options_check(struct multipath_mount_options *options) +++{ +++ int rc = 0; +++ +++ rc = nfs_multipath_parse_options_check_valid(options); +++ +++ if (rc != 0) { +++ pr_err("has invaild ip.\n"); +++ return rc; +++ } +++ +++ rc = nfs_multipath_parse_options_check_duplicate(options); +++ if (rc != 0) +++ return rc; +++ return rc; +++} +++ +++int nfs_multipath_alloc_options(void **enfs_option) +++{ +++ struct multipath_mount_options *options = NULL; +++ +++ options = kzalloc(sizeof(struct multipath_mount_options), GFP_KERNEL); +++ +++ if (options == NULL) +++ return -ENOMEM; +++ +++ options->local_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ if (options->local_ip_list == NULL) { +++ kfree(options); +++ return -ENOMEM; +++ } +++ +++ options->remote_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ if (options->remote_ip_list == NULL) { +++ kfree(options->local_ip_list); +++ kfree(options); +++ return -ENOMEM; +++ } +++ +++ options->pRemoteDnsInfo = kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), +++ GFP_KERNEL); +++ if (options->pRemoteDnsInfo == NULL) { +++ kfree(options->remote_ip_list); +++ kfree(options->local_ip_list); +++ kfree(options); +++ return -ENOMEM; +++ } +++ +++ *enfs_option = options; +++ return 0; +++} +++ +++int nfs_multipath_parse_options(enum nfsmultipathoptions type, +++ char *str, void **enfs_option, struct net *net_ns) +++{ +++ int rc; +++ struct multipath_mount_options *options = NULL; +++ +++ if ((str == NULL) || (enfs_option == NULL) || (net_ns == NULL)) +++ return -EINVAL; +++ +++ if (*enfs_option == NULL) { +++ rc = nfs_multipath_alloc_options(enfs_option); +++ if (rc != 0) { +++ enfs_log_error( +++ "alloc enfs_options failed! errno:%d\n", rc); +++ return rc; +++ } +++ } +++ +++ options = (struct multipath_mount_options *)*enfs_option; +++ +++ if (type == LOCALADDR || type == REMOUNTLOCALADDR || +++ type == REMOTEADDR || type == REMOUNTREMOTEADDR) { +++ rc = nfs_multipath_parse_ip_list(str, net_ns, options, type); +++ } else if (type == REMOTEDNSNAME) { +++ /* alloc and release need to modify */ +++ rc = nfs_multipath_parse_dns_list(str, net_ns, options); +++ } else { +++ rc = -EOPNOTSUPP; +++ } +++ +++ // after parsing cmd, need checking local and remote +++ // IP is same. if not means illegal cmd +++ if (rc == 0) +++ rc = nfs_multipath_parse_options_check_duplicate(options); +++ +++ if (rc == 0) +++ rc = nfs_multipath_parse_options_check(options); +++ +++ return rc; +++} +++ +++void nfs_multipath_free_options(void **enfs_option) +++{ +++ struct multipath_mount_options *options; +++ +++ if (enfs_option == NULL || *enfs_option == NULL) +++ return; +++ +++ options = (struct multipath_mount_options *)*enfs_option; +++ +++ if (options->remote_ip_list != NULL) { +++ kfree(options->remote_ip_list); +++ options->remote_ip_list = NULL; +++ } +++ +++ if (options->local_ip_list != NULL) { +++ kfree(options->local_ip_list); +++ options->local_ip_list = NULL; +++ } +++ +++ if (options->pRemoteDnsInfo != NULL) { +++ kfree(options->pRemoteDnsInfo); +++ options->pRemoteDnsInfo = NULL; +++ } +++ +++ kfree(options); +++ *enfs_option = NULL; +++} ++diff --git a/fs/nfs/enfs/enfs_multipath_parse.h b/fs/nfs/enfs/enfs_multipath_parse.h ++new file mode 100644 ++index 000000000000..6f3e8703e3e2 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_parse.h ++@@ -0,0 +1,22 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _ENFS_MULTIPATH_PARSE_H_ +++#define _ENFS_MULTIPATH_PARSE_H_ +++ +++#include "enfs.h" +++ +++struct multipath_mount_options { +++ struct nfs_ip_list *remote_ip_list; +++ struct nfs_ip_list *local_ip_list; +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; +++}; +++ +++int nfs_multipath_parse_options(enum nfsmultipathoptions type, +++ char *str, void **enfs_option, struct net *net_ns); +++void nfs_multipath_free_options(void **enfs_option); +++ +++#endif +diff --git a/0004-add_enfs_module_for_sunrpc_multipatch.patch b/0004-add_enfs_module_for_sunrpc_multipatch.patch +new file mode 100644 +index 0000000..2c0fcc7 +--- /dev/null ++++ b/0004-add_enfs_module_for_sunrpc_multipatch.patch +@@ -0,0 +1,1581 @@ ++diff --git a/fs/nfs/enfs/enfs_multipath.h b/fs/nfs/enfs/enfs_multipath.h ++new file mode 100644 ++index 000000000000..e064c2929ced ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath.h ++@@ -0,0 +1,24 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: enfs multipath +++ * Author: +++ * Create: 2023-07-31 +++ */ +++ +++#ifndef ENFS_MULTIPATH_H +++#define ENFS_MULTIPATH_H +++#include <linux/sunrpc/clnt.h> +++ +++#define MAX_XPRT_NUM_PER_CLIENT 32 +++ +++int enfs_multipath_init(void); +++void enfs_multipath_exit(void); +++void enfs_xprt_ippair_create(struct xprt_create *xprtargs, +++ struct rpc_clnt *clnt, void *data); +++int enfs_config_xprt_create_args(struct xprt_create *xprtargs, +++ struct rpc_create_args *args, +++ char *servername, size_t length); +++void print_enfs_multipath_addr(struct sockaddr *local, struct sockaddr *remote); +++ +++#endif // ENFS_MULTIPATH_H ++diff --git a/fs/nfs/enfs/enfs_multipath_client.c b/fs/nfs/enfs/enfs_multipath_client.c ++new file mode 100644 ++index 000000000000..63c02898a42c ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_client.c ++@@ -0,0 +1,340 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/types.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include <linux/proc_fs.h> +++#include <linux/seq_file.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/addr.h> +++#include "enfs_multipath_client.h" +++#include "enfs_multipath_parse.h" +++ +++int +++nfs_multipath_client_mount_info_init(struct multipath_client_info *client_info, +++ const struct nfs_client_initdata *client_init_data) +++{ +++ struct multipath_mount_options *mount_options = +++ (struct multipath_mount_options *)client_init_data->enfs_option; +++ +++ if (mount_options->local_ip_list) { +++ client_info->local_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ +++ if (!client_info->local_ip_list) +++ return -ENOMEM; +++ +++ memcpy(client_info->local_ip_list, mount_options->local_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (mount_options->remote_ip_list) { +++ +++ client_info->remote_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ +++ if (!client_info->remote_ip_list) { +++ kfree(client_info->local_ip_list); +++ client_info->local_ip_list = NULL; +++ return -ENOMEM; +++ } +++ memcpy(client_info->remote_ip_list, +++ mount_options->remote_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (mount_options->pRemoteDnsInfo) { +++ client_info->pRemoteDnsInfo = +++ kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), GFP_KERNEL); +++ +++ if (!client_info->pRemoteDnsInfo) { +++ kfree(client_info->local_ip_list); +++ client_info->local_ip_list = NULL; +++ kfree(client_info->remote_ip_list); +++ client_info->remote_ip_list = NULL; +++ return -ENOMEM; +++ } +++ memcpy(client_info->pRemoteDnsInfo, +++ mount_options->pRemoteDnsInfo, +++ sizeof(struct NFS_ROUTE_DNS_INFO_S)); +++ } +++ return 0; +++} +++ +++void nfs_multipath_client_info_free_work(struct work_struct *work) +++{ +++ +++ struct multipath_client_info *clp_info; +++ +++ if (work == NULL) +++ return; +++ +++ clp_info = container_of(work, struct multipath_client_info, work); +++ +++ if (clp_info->local_ip_list != NULL) { +++ kfree(clp_info->local_ip_list); +++ clp_info->local_ip_list = NULL; +++ } +++ if (clp_info->remote_ip_list != NULL) { +++ kfree(clp_info->remote_ip_list); +++ clp_info->remote_ip_list = NULL; +++ } +++ kfree(clp_info); +++} +++ +++void nfs_multipath_client_info_free(void *data) +++{ +++ struct multipath_client_info *clp_info = +++ (struct multipath_client_info *)data; +++ +++ if (clp_info == NULL) +++ return; +++ pr_info("free client info %p.\n", clp_info); +++ INIT_WORK(&clp_info->work, nfs_multipath_client_info_free_work); +++ schedule_work(&clp_info->work); +++} +++ +++int nfs_multipath_client_info_init(void **data, +++ const struct nfs_client_initdata *cl_init) +++{ +++ int rc; +++ struct multipath_client_info *info; +++ struct multipath_client_info **enfs_info; +++ /* no multi path info, no need do multipath init */ +++ if (cl_init->enfs_option == NULL) +++ return 0; +++ enfs_info = (struct multipath_client_info **)data; +++ if (enfs_info == NULL) +++ return -EINVAL; +++ +++ if (*enfs_info == NULL) +++ *enfs_info = kzalloc(sizeof(struct multipath_client_info), +++ GFP_KERNEL); +++ +++ if (*enfs_info == NULL) +++ return -ENOMEM; +++ +++ info = (struct multipath_client_info *)*enfs_info; +++ pr_info("init client info %p.\n", info); +++ rc = nfs_multipath_client_mount_info_init(info, cl_init); +++ if (rc) { +++ nfs_multipath_client_info_free((void *)info); +++ return rc; +++ } +++ return rc; +++} +++ +++bool nfs_multipath_ip_list_info_match(const struct nfs_ip_list *ip_list_src, +++ const struct nfs_ip_list *ip_list_dst) +++{ +++ int i; +++ int j; +++ bool is_find; +++ /* if both are equal or NULL, then return true. */ +++ if (ip_list_src == ip_list_dst) +++ return true; +++ +++ if ((ip_list_src == NULL || ip_list_dst == NULL)) +++ return false; +++ +++ if (ip_list_src->count != ip_list_dst->count) +++ return false; +++ +++ for (i = 0; i < ip_list_src->count; i++) { +++ is_find = false; +++ for (j = 0; j < ip_list_src->count; j++) { +++ if (rpc_cmp_addr_port( +++ (const struct sockaddr *) +++ &ip_list_src->address[i], +++ (const struct sockaddr *) +++ &ip_list_dst->address[j]) +++ ) { +++ is_find = true; +++ break; +++ } +++ } +++ if (is_find == false) +++ return false; +++ } +++ return true; +++} +++ +++int +++nfs_multipath_dns_list_info_match( +++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoSrc, +++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoDst) +++{ +++ int i; +++ +++ /* if both are equal or NULL, then return true. */ +++ if (pRemoteDnsInfoSrc == pRemoteDnsInfoDst) +++ return true; +++ +++ if ((pRemoteDnsInfoSrc == NULL || pRemoteDnsInfoDst == NULL)) +++ return false; +++ +++ if (pRemoteDnsInfoSrc->dnsNameCount != pRemoteDnsInfoDst->dnsNameCount) +++ return false; +++ +++ for (i = 0; i < pRemoteDnsInfoSrc->dnsNameCount; i++) { +++ if (!strcmp(pRemoteDnsInfoSrc->routeRemoteDnsList[i].dnsname, +++ pRemoteDnsInfoDst->routeRemoteDnsList[i].dnsname)) +++ return false; +++ } +++ return true; +++} +++ +++int nfs_multipath_client_info_match(void *src, void *dst) +++{ +++ int ret = true; +++ +++ struct multipath_client_info *src_info; +++ struct multipath_mount_options *dst_info; +++ +++ src_info = (struct multipath_client_info *)src; +++ dst_info = (struct multipath_mount_options *)dst; +++ pr_info("try match client .\n"); +++ ret = nfs_multipath_ip_list_info_match(src_info->local_ip_list, +++ dst_info->local_ip_list); +++ if (ret == false) { +++ pr_err("local_ip not match.\n"); +++ return ret; +++ } +++ +++ ret = nfs_multipath_ip_list_info_match(src_info->remote_ip_list, +++ dst_info->remote_ip_list); +++ if (ret == false) { +++ pr_err("remote_ip not match.\n"); +++ return ret; +++ } +++ +++ ret = nfs_multipath_dns_list_info_match(src_info->pRemoteDnsInfo, +++ dst_info->pRemoteDnsInfo); +++ if (ret == false) { +++ pr_err("dns not match.\n"); +++ return ret; +++ } +++ pr_info("try match client ret %d.\n", ret); +++ return ret; +++} +++ +++void nfs_multipath_print_ip_info(struct seq_file *mount_option, +++ struct nfs_ip_list *ip_list, +++ const char *type) +++{ +++ char buf[IP_ADDRESS_LEN_MAX + 1]; +++ int len = 0; +++ int i = 0; +++ +++ seq_printf(mount_option, ",%s=", type); +++ for (i = 0; i < ip_list->count; i++) { +++ len = rpc_ntop((struct sockaddr *)&ip_list->address[i], +++ buf, IP_ADDRESS_LEN_MAX); +++ if (len > 0 && len < IP_ADDRESS_LEN_MAX) +++ buf[len] = '\0'; +++ +++ if (i == 0) +++ seq_printf(mount_option, "%s", buf); +++ else +++ seq_printf(mount_option, "~%s", buf); +++ dfprintk(MOUNT, +++ "NFS: show nfs mount option type:%s %s [%s]\n", +++ type, buf, __func__); +++ } +++} +++ +++void nfs_multipath_print_dns_info(struct seq_file *mount_option, +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo, +++ const char *type) +++{ +++ int i = 0; +++ +++ seq_printf(mount_option, ",%s=", type); +++ for (i = 0; i < pRemoteDnsInfo->dnsNameCount; i++) { +++ if (i == 0) +++ seq_printf(mount_option, +++ "[%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ else if (i == pRemoteDnsInfo->dnsNameCount - 1) +++ seq_printf(mount_option, ",%s]", +++ pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ else +++ seq_printf(mount_option, +++ ",%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ } +++} +++ +++ +++static void multipath_print_sockaddr(struct seq_file *seq, +++ struct sockaddr *addr) +++{ +++ switch (addr->sa_family) { +++ case AF_INET: { +++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; +++ +++ seq_printf(seq, "%pI4", &sin->sin_addr); +++ return; +++ } +++ case AF_INET6: { +++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; +++ +++ seq_printf(seq, "%pI6", &sin6->sin6_addr); +++ return; +++ } +++ default: +++ break; +++ } +++ pr_err("unsupport family:%d\n", addr->sa_family); +++} +++ +++static void multipath_print_enfs_info(struct seq_file *seq, +++ struct nfs_server *server) +++{ +++ struct sockaddr_storage peeraddr; +++ struct rpc_clnt *next = server->client; +++ +++ rpc_peeraddr(server->client, +++ (struct sockaddr *)&peeraddr, sizeof(peeraddr)); +++ seq_puts(seq, ",enfs_info="); +++ multipath_print_sockaddr(seq, (struct sockaddr *)&peeraddr); +++ +++ while (next->cl_parent) { +++ if (next == next->cl_parent) +++ break; +++ next = next->cl_parent; +++ } +++ seq_printf(seq, "_%u", next->cl_clid); +++} +++ +++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data) +++{ +++ struct nfs_server *server = data; +++ struct multipath_client_info *client_info = +++ server->nfs_client->cl_multipath_data; +++ +++ dfprintk(MOUNT, "NFS: show nfs mount option[%s]\n", __func__); +++ if ((client_info->remote_ip_list) && +++ (client_info->remote_ip_list->count > 0)) +++ nfs_multipath_print_ip_info(mount_option, +++ client_info->remote_ip_list, +++ "remoteaddrs"); +++ +++ if ((client_info->local_ip_list) && +++ (client_info->local_ip_list->count > 0)) +++ nfs_multipath_print_ip_info(mount_option, +++ client_info->local_ip_list, +++ "localaddrs"); +++ +++ if ((client_info->pRemoteDnsInfo) && +++ (client_info->pRemoteDnsInfo->dnsNameCount > 0)) +++ nfs_multipath_print_dns_info(mount_option, +++ client_info->pRemoteDnsInfo, +++ "remotednsname"); +++ +++ multipath_print_enfs_info(mount_option, server); +++} ++diff --git a/fs/nfs/enfs/enfs_multipath_client.h b/fs/nfs/enfs/enfs_multipath_client.h ++new file mode 100644 ++index 000000000000..208f7260690d ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_client.h ++@@ -0,0 +1,26 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _ENFS_MULTIPATH_CLIENT_H_ +++#define _ENFS_MULTIPATH_CLIENT_H_ +++ +++#include "enfs.h" +++ +++struct multipath_client_info { +++ struct work_struct work; +++ struct nfs_ip_list *remote_ip_list; +++ struct nfs_ip_list *local_ip_list; +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; +++ s64 client_id; +++}; +++ +++int nfs_multipath_client_info_init(void **data, +++ const struct nfs_client_initdata *cl_init); +++void nfs_multipath_client_info_free(void *data); +++int nfs_multipath_client_info_match(void *src, void *dst); +++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data); +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_path.c b/fs/nfs/enfs/enfs_path.c ++new file mode 100644 ++index 000000000000..7355f8c2f672 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_path.c ++@@ -0,0 +1,47 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++ +++#include <linux/sunrpc/metrics.h> +++#include <linux/sunrpc/xprt.h> +++ +++#include "enfs.h" +++#include "enfs_log.h" +++#include "enfs_path.h" +++ +++// only create ctx in this function +++// alloc iostat memory in create_clnt +++int enfs_alloc_xprt_ctx(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx; +++ +++ if (!xprt) { +++ enfs_log_error("invalid xprt pointer.\n"); +++ return -EINVAL; +++ } +++ +++ ctx = kzalloc(sizeof(struct enfs_xprt_context), GFP_KERNEL); +++ if (!ctx) { +++ enfs_log_error("add xprt test failed.\n"); +++ return -ENOMEM; +++ } +++ +++ xprt->multipath_context = (void *)ctx; +++ return 0; +++} +++ +++// free multi_context and iostat memory +++void enfs_free_xprt_ctx(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx = xprt->multipath_context; +++ +++ if (ctx) { +++ if (ctx->stats) { +++ rpc_free_iostats(ctx->stats); +++ ctx->stats = NULL; +++ } +++ kfree(xprt->multipath_context); +++ xprt->multipath_context = NULL; +++ } +++} ++diff --git a/fs/nfs/enfs/enfs_path.h b/fs/nfs/enfs/enfs_path.h ++new file mode 100644 ++index 000000000000..97b1ef3730b8 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_path.h ++@@ -0,0 +1,12 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++ +++#ifndef ENFS_PATH_H +++#define ENFS_PATH_H +++ +++int enfs_alloc_xprt_ctx(struct rpc_xprt *xprt); +++void enfs_free_xprt_ctx(struct rpc_xprt *xprt); +++ +++#endif // ENFS_PATH_H ++diff --git a/fs/nfs/enfs/enfs_proc.c b/fs/nfs/enfs/enfs_proc.c ++new file mode 100644 ++index 000000000000..53fa1a07642f ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_proc.c ++@@ -0,0 +1,545 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++#include <linux/module.h> +++#include <linux/proc_fs.h> +++#include <linux/seq_file.h> +++#include <linux/spinlock.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/metrics.h> +++#include <linux/sunrpc/xprtsock.h> +++#include <net/netns/generic.h> +++ +++#include "../../../net/sunrpc/netns.h" +++ +++#include "enfs.h" +++#include "enfs_log.h" +++#include "enfs_proc.h" +++#include "enfs_multipath.h" +++#include "pm_state.h" +++ +++#define ENFS_PROC_DIR "enfs" +++#define ENFS_PROC_PATH_STATUS_LEN 256 +++ +++static struct proc_dir_entry *enfs_proc_parent; +++ +++void +++enfs_iterate_each_rpc_clnt(int (*fn)(struct rpc_clnt *clnt, void *data), +++ void *data) +++{ +++ struct net *net; +++ struct sunrpc_net *sn; +++ struct rpc_clnt *clnt; +++ +++ rcu_read_lock(); +++ for_each_net_rcu(net) { +++ sn = net_generic(net, sunrpc_net_id); +++ if (sn == NULL) +++ continue; +++ spin_lock(&sn->rpc_client_lock); +++ list_for_each_entry(clnt, &sn->all_clients, cl_clients) { +++ fn(clnt, data); +++ } +++ spin_unlock(&sn->rpc_client_lock); +++ } +++ rcu_read_unlock(); +++} +++ +++struct proc_dir_entry *enfs_get_proc_parent(void) +++{ +++ return enfs_proc_parent; +++} +++ +++static int sockaddr_ip_to_str(struct sockaddr *addr, char *buf, int len) +++{ +++ switch (addr->sa_family) { +++ case AF_INET: { +++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; +++ +++ snprintf(buf, len, "%pI4", &sin->sin_addr); +++ return 0; +++ } +++ case AF_INET6: { +++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; +++ +++ snprintf(buf, len, "%pI6", &sin6->sin6_addr); +++ return 0; +++ } +++ default: +++ break; +++ } +++ return 1; +++} +++ +++static bool should_print(const char *name) +++{ +++ int i; +++ static const char * const proc_names[] = { +++ "READ", +++ "WRITE", +++ }; +++ +++ if (name == NULL) +++ return false; +++ +++ for (i = 0; i < ARRAY_SIZE(proc_names); i++) { +++ if (strcmp(name, proc_names[i]) == 0) +++ return true; +++ } +++ return false; +++} +++ +++struct enfs_xprt_iter { +++ unsigned int id; +++ struct seq_file *seq; +++ unsigned int max_addrs_length; +++}; +++ +++static int debug_show_xprt(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ +++ if (xprt->multipath_context) +++ ctx = xprt->multipath_context; +++ +++ pr_info(" xprt:%p ctx:%p main:%d queue_len:%lu.\n", xprt, +++ xprt->multipath_context, +++ ctx ? ctx->main : false, +++ atomic_long_read(&xprt->queuelen)); +++ return 0; +++} +++ +++static int debug_show_clnt(struct rpc_clnt *clnt, void *data) +++{ +++ pr_info(" clnt %d addr:%p enfs:%d\n", +++ clnt->cl_clid, clnt, +++ clnt->cl_enfs); +++ rpc_clnt_iterate_for_each_xprt(clnt, debug_show_xprt, NULL); +++ return 0; +++} +++ +++static void debug_print_all_xprt(void) +++{ +++ enfs_iterate_each_rpc_clnt(debug_show_clnt, NULL); +++} +++ +++static +++void enfs_proc_format_xprt_addr_display(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ char *local_name_buf, +++ int local_name_buf_len, +++ char *remote_name_buf, +++ int remote_name_buf_len) +++{ +++ int err; +++ struct sockaddr_storage srcaddr; +++ struct enfs_xprt_context *ctx; +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ +++ sockaddr_ip_to_str((struct sockaddr *)&xprt->addr, +++ remote_name_buf, remote_name_buf_len); +++ +++ // get local address depend one main or not +++ if (enfs_is_main_xprt(xprt)) { +++ err = rpc_localaddr(clnt, (struct sockaddr *)&srcaddr, +++ sizeof(srcaddr)); +++ if (err != 0) +++ (void)snprintf(local_name_buf, +++ local_name_buf_len, "Unknown"); +++ else +++ sockaddr_ip_to_str((struct sockaddr *)&srcaddr, +++ local_name_buf, +++ local_name_buf_len); +++ } else { +++ sockaddr_ip_to_str((struct sockaddr *)&ctx->srcaddr, +++ local_name_buf, +++ local_name_buf_len); +++ } +++} +++ +++static int enfs_show_xprt_stats(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ unsigned int op; +++ unsigned int maxproc = clnt->cl_maxproc; +++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; +++ struct enfs_xprt_context *ctx; +++ char local_name[INET6_ADDRSTRLEN]; +++ char remote_name[INET6_ADDRSTRLEN]; +++ +++ if (!xprt->multipath_context) +++ return 0; +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ +++ enfs_proc_format_xprt_addr_display(clnt, xprt, local_name, +++ sizeof(local_name), +++ remote_name, sizeof(remote_name)); +++ +++ seq_printf(iter->seq, "%-6u%-*s%-*s", iter->id, +++ iter->max_addrs_length + 4, +++ local_name, +++ iter->max_addrs_length + 4, +++ remote_name); +++ +++ iter->id++; +++ +++ for (op = 0; op < maxproc; op++) { +++ if (!should_print(clnt->cl_procinfo[op].p_name)) +++ continue; +++ +++ seq_printf(iter->seq, "%-22lu%-22Lu%-22Lu", +++ ctx->stats[op].om_ops, +++ ctx->stats[op].om_ops == 0 ? 0 : +++ ktime_to_ms(ctx->stats[op].om_rtt) / +++ ctx->stats[op].om_ops, +++ ctx->stats[op].om_ops == 0 ? 0 : +++ ktime_to_ms(ctx->stats[op].om_execute) / +++ ctx->stats[op].om_ops); +++ } +++ seq_puts(iter->seq, "\n"); +++ return 0; +++} +++ +++static int rpc_proc_show_path_status(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; +++ struct enfs_xprt_context *ctx = NULL; +++ char local_name[INET6_ADDRSTRLEN] = {0}; +++ char remote_name[INET6_ADDRSTRLEN] = {0}; +++ char multiapth_status[ENFS_PROC_PATH_STATUS_LEN] = {0}; +++ char xprt_status[ENFS_PROC_PATH_STATUS_LEN] = {0}; +++ +++ if (!xprt->multipath_context) { +++ enfs_log_debug("multipath_context is null.\n"); +++ return 0; +++ } +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ +++ enfs_proc_format_xprt_addr_display(clnt, xprt, +++ local_name, +++ sizeof(local_name), +++ remote_name, sizeof(remote_name)); +++ +++ pm_get_path_state_desc(xprt, +++ multiapth_status, +++ ENFS_PROC_PATH_STATUS_LEN); +++ +++ pm_get_xprt_state_desc(xprt, +++ xprt_status, +++ ENFS_PROC_PATH_STATUS_LEN); +++ +++ seq_printf(iter->seq, "%-6u%-*s%-*s%-12s%-12s\n", +++ iter->id, iter->max_addrs_length + 4, +++ local_name, iter->max_addrs_length + 4, +++ remote_name, multiapth_status, +++ xprt_status); +++ iter->id++; +++ return 0; +++} +++ +++static int enfs_get_max_addrs_length(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; +++ char local_name[INET6_ADDRSTRLEN]; +++ char remote_name[INET6_ADDRSTRLEN]; +++ +++ enfs_proc_format_xprt_addr_display(clnt, xprt, +++ local_name, sizeof(local_name), +++ remote_name, sizeof(remote_name)); +++ +++ if (iter->max_addrs_length < strlen(local_name)) +++ iter->max_addrs_length = strlen(local_name); +++ +++ if (iter->max_addrs_length < strlen(remote_name)) +++ iter->max_addrs_length = strlen(remote_name); +++ +++ return 0; +++} +++ +++static int rpc_proc_clnt_showpath(struct seq_file *seq, void *v) +++{ +++ struct rpc_clnt *clnt = seq->private; +++ struct enfs_xprt_iter iter; +++ +++ iter.seq = seq; +++ iter.id = 0; +++ iter.max_addrs_length = 0; +++ +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ enfs_get_max_addrs_length, +++ (void *)&iter); +++ +++ seq_printf(seq, "%-6s%-*s%-*s%-12s%-12s\n", "id", +++ iter.max_addrs_length + 4, +++ "local_addr", +++ iter.max_addrs_length + 4, +++ "remote_addr", +++ "path_state", +++ "xprt_state"); +++ +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ rpc_proc_show_path_status, +++ (void *)&iter); +++ return 0; +++} +++ +++static int enfs_rpc_proc_show(struct seq_file *seq, void *v) +++{ +++ struct rpc_clnt *clnt = seq->private; +++ struct enfs_xprt_iter iter; +++ +++ iter.seq = seq; +++ iter.id = 0; +++ iter.max_addrs_length = 0; +++ +++ debug_print_all_xprt(); +++ pr_info("enfs proc clnt:%p\n", clnt); +++ +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ enfs_get_max_addrs_length, +++ (void *)&iter); +++ +++ seq_printf(seq, "%-6s%-*s%-*s%-22s%-22s%-22s%-22s%-22s%-22s\n", "id", +++ iter.max_addrs_length + 4, "local_addr", +++ iter.max_addrs_length + 4, +++ "remote_addr", "r_count", +++ "r_rtt", "r_exec", "w_count", "w_rtt", "w_exec"); +++ +++ // rpc_clnt_show_stats(seq, clnt); +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ enfs_show_xprt_stats, +++ (void *)&iter); +++ return 0; +++} +++ +++static int rpc_proc_open(struct inode *inode, struct file *file) +++{ +++ struct rpc_clnt *clnt = PDE_DATA(inode); +++ +++ pr_info("%s %p\n", __func__, clnt); +++ return single_open(file, enfs_rpc_proc_show, clnt); +++} +++ +++static int enfs_reset_xprt_stats(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ unsigned int op; +++ struct enfs_xprt_context *ctx; +++ unsigned int maxproc = clnt->cl_maxproc; +++ struct rpc_iostats stats = {0}; +++ +++ if (!xprt->multipath_context) +++ return 0; +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ +++ for (op = 0; op < maxproc; op++) { +++ spin_lock(&ctx->stats[op].om_lock); +++ ctx->stats[op] = stats; +++ spin_unlock(&ctx->stats[op].om_lock); +++ } +++ return 0; +++} +++ +++static void trim_newline_ch(char *str, int len) +++{ +++ int i; +++ +++ for (i = 0; str[i] != '\0' && i < len; i++) { +++ if (str[i] == '\n') +++ str[i] = '\0'; +++ } +++} +++ +++static ssize_t enfs_proc_write(struct file *file, +++ const char __user *user_buf, +++ size_t len, +++ loff_t *offset) +++{ +++ char buffer[128]; +++ struct rpc_clnt *clnt = +++ ((struct seq_file *)file->private_data)->private; +++ +++ if (len >= sizeof(buffer)) +++ return -E2BIG; +++ +++ if (copy_from_user(buffer, user_buf, len) != 0) +++ return -EFAULT; +++ +++ buffer[len] = '\0'; +++ trim_newline_ch(buffer, len); +++ if (strcmp(buffer, "reset") != 0) +++ return -EINVAL; +++ +++ rpc_clnt_iterate_for_each_xprt(clnt, enfs_reset_xprt_stats, NULL); +++ return len; +++} +++ +++static int rpc_proc_show_path(struct inode *inode, struct file *file) +++{ +++ struct rpc_clnt *clnt = PDE_DATA(inode); +++ +++ return single_open(file, rpc_proc_clnt_showpath, clnt); +++} +++ +++static const struct file_operations rpc_proc_fops = { +++ .owner = THIS_MODULE, +++ .open = rpc_proc_open, +++ .read = seq_read, +++ .llseek = seq_lseek, +++ .release = single_release, +++ .write = enfs_proc_write, +++}; +++ +++static const struct file_operations rpc_show_path_fops = { +++ .owner = THIS_MODULE, +++ .open = rpc_proc_show_path, +++ .read = seq_read, +++ .llseek = seq_lseek, +++ .release = single_release, +++}; +++ +++static int clnt_proc_name(struct rpc_clnt *clnt, char *buf, int len) +++{ +++ int ret; +++ +++ ret = snprintf(buf, len, "%s_%u", +++ rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR), +++ clnt->cl_clid); +++ if (ret > len) +++ return -E2BIG; +++ return 0; +++} +++ +++static int enfs_proc_create_file(struct rpc_clnt *clnt) +++{ +++ int err; +++ char buf[128]; +++ +++ struct proc_dir_entry *clnt_entry; +++ struct proc_dir_entry *stat_entry; +++ +++ err = clnt_proc_name(clnt, buf, sizeof(buf)); +++ if (err) +++ return err; +++ +++ clnt_entry = proc_mkdir(buf, enfs_proc_parent); +++ if (clnt_entry == NULL) +++ return -EINVAL; +++ +++ stat_entry = proc_create_data("stat", +++ 0, clnt_entry, +++ &rpc_proc_fops, clnt); +++ +++ if (stat_entry == NULL) +++ return -EINVAL; +++ +++ stat_entry = proc_create_data("path", +++ 0, clnt_entry, +++ &rpc_show_path_fops, clnt); +++ +++ if (stat_entry == NULL) +++ return -EINVAL; +++ +++ return 0; +++} +++ +++void enfs_count_iostat(struct rpc_task *task) +++{ +++ struct enfs_xprt_context *ctx = task->tk_xprt->multipath_context; +++ +++ if (!ctx || !ctx->stats) +++ return; +++ rpc_count_iostats(task, ctx->stats); +++} +++ +++static void enfs_proc_delete_file(struct rpc_clnt *clnt) +++{ +++ int err; +++ char buf[128]; +++ +++ err = clnt_proc_name(clnt, buf, sizeof(buf)); +++ if (err) { +++ pr_err("gen clnt name failed.\n"); +++ return; +++ } +++ remove_proc_subtree(buf, enfs_proc_parent); +++} +++ +++// create proc file "/porc/enfs/[mount_ip]_[id]/stat" +++int enfs_proc_create_clnt(struct rpc_clnt *clnt) +++{ +++ int err; +++ +++ err = enfs_proc_create_file(clnt); +++ if (err) { +++ pr_err("create client %d\n", err); +++ return err; +++ } +++ +++ return 0; +++} +++ +++void enfs_proc_delete_clnt(struct rpc_clnt *clnt) +++{ +++ if (clnt->cl_enfs) +++ enfs_proc_delete_file(clnt); +++} +++ +++static int enfs_proc_create_parent(void) +++{ +++ enfs_proc_parent = proc_mkdir(ENFS_PROC_DIR, NULL); +++ +++ if (enfs_proc_parent == NULL) { +++ pr_err("Enfs create proc dir err\n"); +++ return -ENOMEM; +++ } +++ return 0; +++} +++ +++static void enfs_proc_delete_parent(void) +++{ +++ remove_proc_entry(ENFS_PROC_DIR, NULL); +++} +++ +++static int enfs_proc_init_create_clnt(struct rpc_clnt *clnt, void *data) +++{ +++ if (clnt->cl_enfs) +++ enfs_proc_create_file(clnt); +++ return 0; +++} +++ +++static int enfs_proc_destroy_clnt(struct rpc_clnt *clnt, void *data) +++{ +++ if (clnt->cl_enfs) +++ enfs_proc_delete_file(clnt); +++ return 0; +++} +++ +++int enfs_proc_init(void) +++{ +++ int err; +++ +++ err = enfs_proc_create_parent(); +++ if (err) +++ return err; +++ +++ enfs_iterate_each_rpc_clnt(enfs_proc_init_create_clnt, NULL); +++ return 0; +++} +++ +++void enfs_proc_exit(void) +++{ +++ enfs_iterate_each_rpc_clnt(enfs_proc_destroy_clnt, NULL); +++ enfs_proc_delete_parent(); +++} ++diff --git a/fs/nfs/enfs/enfs_proc.h b/fs/nfs/enfs/enfs_proc.h ++new file mode 100644 ++index 000000000000..321951031c2e ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_proc.h ++@@ -0,0 +1,21 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS PROC. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef ENFS_PROC_H +++#define ENFS_PROC_H +++ +++struct rpc_clnt; +++struct rpc_task; +++struct proc_dir_entry; +++ +++int enfs_proc_init(void); +++void enfs_proc_exit(void); +++struct proc_dir_entry *enfs_get_proc_parent(void); +++int enfs_proc_create_clnt(struct rpc_clnt *clnt); +++void enfs_proc_delete_clnt(struct rpc_clnt *clnt); +++void enfs_count_iostat(struct rpc_task *task); +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_remount.c b/fs/nfs/enfs/enfs_remount.c ++new file mode 100644 ++index 000000000000..2c3fe125c735 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_remount.c ++@@ -0,0 +1,221 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: remount ip source file +++ * Author: y00583252 +++ * Create: 2023-08-12 +++ */ +++#include "enfs_remount.h" +++ +++#include <linux/string.h> +++#include <linux/in.h> +++#include <linux/in6.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/spinlock.h> +++#include <linux/sunrpc/addr.h> +++#include <linux/sunrpc/metrics.h> +++#include <linux/sunrpc/xprtmultipath.h> +++#include <linux/sunrpc/xprtsock.h> +++#include <linux/sunrpc/xprt.h> +++#include <linux/smp.h> +++#include <linux/delay.h> +++ +++#include "enfs.h" +++#include "enfs_log.h" +++#include "enfs_multipath.h" +++#include "enfs_multipath_parse.h" +++#include "enfs_path.h" +++#include "enfs_proc.h" +++#include "enfs_multipath_client.h" +++ +++static bool enfs_rpc_xprt_switch_need_delete_addr( +++ struct multipath_mount_options *enfs_option, +++ struct sockaddr *dstaddr, struct sockaddr *srcaddr) +++{ +++ int i; +++ bool find_same_ip = false; +++ int32_t local_total; +++ int32_t remote_total; +++ +++ local_total = enfs_option->local_ip_list->count; +++ remote_total = enfs_option->remote_ip_list->count; +++ if (local_total == 0 || remote_total == 0) { +++ pr_err("no ip list is present.\n"); +++ return false; +++ } +++ +++ for (i = 0; i < local_total; i++) { +++ find_same_ip = +++ rpc_cmp_addr((struct sockaddr *) +++ &enfs_option->local_ip_list->address[i], +++ srcaddr); +++ if (find_same_ip) +++ break; +++ } +++ +++ if (find_same_ip == false) +++ return true; +++ +++ find_same_ip = false; +++ for (i = 0; i < remote_total; i++) { +++ find_same_ip = +++ rpc_cmp_addr((struct sockaddr *) +++ &enfs_option->remote_ip_list->address[i], +++ dstaddr); +++ if (find_same_ip) +++ break; +++ } +++ +++ if (find_same_ip == false) +++ return true; +++ +++ return false; +++} +++ +++// Used in rcu_lock +++static bool enfs_delete_xprt_from_switch(struct rpc_xprt *xprt, +++ void *enfs_option, +++ struct rpc_xprt_switch *xps) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ struct multipath_mount_options *mopt = +++ (struct multipath_mount_options *)enfs_option; +++ +++ if (enfs_is_main_xprt(xprt)) +++ return true; +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (enfs_rpc_xprt_switch_need_delete_addr(mopt, +++ (struct sockaddr *)&xprt->addr, +++ (struct sockaddr *)&ctx->srcaddr)) { +++ +++ print_enfs_multipath_addr((struct sockaddr *)&ctx->srcaddr, +++ (struct sockaddr *)&xprt->addr); +++ rpc_xprt_switch_remove_xprt(xps, xprt); +++ return true; +++ } +++ +++ return false; +++} +++ +++void enfs_clnt_delete_obsolete_xprts(struct nfs_client *nfs_client, +++ void *enfs_option) +++{ +++ int xprt_count = 0; +++ struct rpc_xprt *pos = NULL; +++ struct rpc_xprt_switch *xps = NULL; +++ +++ rcu_read_lock(); +++ xps = xprt_switch_get( +++ rcu_dereference( +++ nfs_client->cl_rpcclient->cl_xpi.xpi_xpswitch)); +++ if (xps == NULL) { +++ rcu_read_unlock(); +++ xprt_switch_put(xps); +++ return; +++ } +++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { +++ if (xprt_count < MAX_XPRT_NUM_PER_CLIENT) { +++ if (enfs_delete_xprt_from_switch( +++ pos, enfs_option, xps) == false) +++ xprt_count++; +++ } else +++ rpc_xprt_switch_remove_xprt(xps, pos); +++ } +++ rcu_read_unlock(); +++ xprt_switch_put(xps); +++} +++ +++int enfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option) +++{ +++ int errno = 0; +++ char servername[48]; +++ struct multipath_mount_options *remount_lists = +++ (struct multipath_mount_options *)enfs_option; +++ struct multipath_client_info *client_info = +++ (struct multipath_client_info *)nfs_client->cl_multipath_data; +++ struct xprt_create xprtargs; +++ struct rpc_create_args args = { +++ .protocol = nfs_client->cl_proto, +++ .net = nfs_client->cl_net, +++ .addrsize = nfs_client->cl_addrlen, +++ .servername = nfs_client->cl_hostname, +++ }; +++ +++ memset(&xprtargs, 0, sizeof(struct xprt_create)); +++ +++ //mount is not use multipath +++ if (client_info == NULL || enfs_option == NULL) { +++ enfs_log_error( +++ "mount information or remount information is empty.\n"); +++ return -EINVAL; +++ } +++ +++ //remount : localaddrs and remoteaddrs are empty +++ if (remount_lists->local_ip_list->count == 0 && +++ remount_lists->remote_ip_list->count == 0) { +++ enfs_log_info("remount local_ip_list and remote_ip_list are NULL\n"); +++ return 0; +++ } +++ +++ errno = enfs_config_xprt_create_args(&xprtargs, +++ &args, servername, sizeof(servername)); +++ +++ if (errno) { +++ enfs_log_error("config_xprt_create failed! errno:%d\n", errno); +++ return errno; +++ } +++ +++ if (remount_lists->local_ip_list->count == 0) { +++ if (client_info->local_ip_list->count == 0) { +++ errno = rpc_localaddr(nfs_client->cl_rpcclient, +++ (struct sockaddr *) +++ &remount_lists->local_ip_list->address[0], +++ sizeof(struct sockaddr_storage)); +++ if (errno) { +++ enfs_log_error("get clnt srcaddr errno:%d\n", +++ errno); +++ return errno; +++ } +++ remount_lists->local_ip_list->count = 1; +++ } else +++ memcpy(remount_lists->local_ip_list, +++ client_info->local_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (remount_lists->remote_ip_list->count == 0) { +++ if (client_info->remote_ip_list->count == 0) { +++ errno = rpc_peeraddr(nfs_client->cl_rpcclient, +++ (struct sockaddr *) +++ &remount_lists->remote_ip_list->address[0], +++ sizeof(struct sockaddr_storage)); +++ if (errno == 0) { +++ enfs_log_error("get clnt dstaddr errno:%d\n", +++ errno); +++ return errno; +++ } +++ remount_lists->remote_ip_list->count = 1; +++ } else +++ memcpy(remount_lists->remote_ip_list, +++ client_info->remote_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ enfs_log_info("Remount creating new links...\n"); +++ enfs_xprt_ippair_create(&xprtargs, +++ nfs_client->cl_rpcclient, +++ remount_lists); +++ +++ enfs_log_info("Remount deleting obsolete links...\n"); +++ enfs_clnt_delete_obsolete_xprts(nfs_client, remount_lists); +++ +++ memcpy(client_info->local_ip_list, +++ remount_lists->local_ip_list, +++ sizeof(struct nfs_ip_list)); +++ memcpy(client_info->remote_ip_list, +++ remount_lists->remote_ip_list, +++ sizeof(struct nfs_ip_list)); +++ +++ return 0; +++} ++diff --git a/fs/nfs/enfs/enfs_remount.h b/fs/nfs/enfs/enfs_remount.h ++new file mode 100644 ++index 000000000000..a663ed257004 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_remount.h ++@@ -0,0 +1,15 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: remount ip header file +++ * Author: y00583252 +++ * Create: 2023-08-12 +++ */ +++#ifndef _ENFS_REMOUNT_ +++#define _ENFS_REMOUNT_ +++#include <linux/string.h> +++#include "enfs.h" +++ +++int enfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option); +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_roundrobin.c b/fs/nfs/enfs/enfs_roundrobin.c ++new file mode 100644 ++index 000000000000..4e4eda784a3e ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_roundrobin.c ++@@ -0,0 +1,255 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++#include <linux/spinlock.h> +++#include <linux/module.h> +++#include <linux/printk.h> +++#include <linux/kref.h> +++#include <linux/rculist.h> +++#include <linux/types.h> +++#include <linux/sunrpc/xprt.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/xprtmultipath.h> +++#include "enfs_roundrobin.h" +++ +++#include "enfs.h" +++#include "enfs_config.h" +++#include "pm_state.h" +++ +++typedef struct rpc_xprt *(*enfs_xprt_switch_find_xprt_t)( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur); +++static const struct rpc_xprt_iter_ops enfs_xprt_iter_roundrobin; +++static const struct rpc_xprt_iter_ops enfs_xprt_iter_singular; +++ +++static bool enfs_xprt_is_active(struct rpc_xprt *xprt) +++{ +++ enum pm_path_state state; +++ +++ if (kref_read(&xprt->kref) <= 0) +++ return false; +++ +++ state = pm_get_path_state(xprt); +++ if (state == PM_STATE_NORMAL) +++ return true; +++ +++ return false; +++} +++ +++static struct rpc_xprt *enfs_lb_set_cursor_xprt( +++ struct rpc_xprt_switch *xps, struct rpc_xprt **cursor, +++ enfs_xprt_switch_find_xprt_t find_next) +++{ +++ struct rpc_xprt *pos; +++ struct rpc_xprt *old; +++ +++ old = smp_load_acquire(cursor); /* read latest cursor */ +++ pos = find_next(xps, old); +++ smp_store_release(cursor, pos); /* let cursor point to pos */ +++ return pos; +++} +++ +++static +++struct rpc_xprt *enfs_lb_find_next_entry_roundrobin( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *pos; +++ struct rpc_xprt *prev = NULL; +++ bool found = false; +++ struct rpc_xprt *min_queuelen_xprt = NULL; +++ unsigned long pos_xprt_queuelen; +++ unsigned long min_xprt_queuelen = 0; +++ +++ unsigned long xps_queuelen = atomic_long_read(&xps->xps_queuelen); +++ // delete origin xprt +++ unsigned int multipath_nactive = READ_ONCE(xps->xps_nactive) - 1; +++ +++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { +++ if (enfs_is_main_xprt(pos) || !enfs_xprt_is_active(pos)) { +++ prev = pos; +++ continue; +++ } +++ +++ pos_xprt_queuelen = atomic_long_read(&pos->queuelen); +++ if (min_queuelen_xprt == NULL || +++ pos_xprt_queuelen < min_xprt_queuelen) { +++ +++ min_queuelen_xprt = pos; +++ min_xprt_queuelen = pos_xprt_queuelen; +++ } +++ +++ if (cur == prev) +++ found = true; +++ +++ if (found && pos_xprt_queuelen * +++ multipath_nactive <= xps_queuelen) +++ return pos; +++ prev = pos; +++ }; +++ +++ return min_queuelen_xprt; +++} +++ +++struct rpc_xprt *enfs_lb_switch_find_first_active_xprt( +++ struct rpc_xprt_switch *xps) +++{ +++ struct rpc_xprt *pos; +++ +++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { +++ if (enfs_xprt_is_active(pos)) +++ return pos; +++ }; +++ return NULL; +++} +++ +++struct rpc_xprt *enfs_lb_switch_get_main_xprt(struct rpc_xprt_switch *xps) +++{ +++ return list_first_or_null_rcu(&xps->xps_xprt_list, +++ struct rpc_xprt, xprt_switch); +++} +++ +++static struct rpc_xprt *enfs_lb_switch_get_next_xprt_roundrobin( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *xprt; +++ +++ // disable multipath +++ if (enfs_get_config_multipath_state()) +++ return enfs_lb_switch_get_main_xprt(xps); +++ +++ xprt = enfs_lb_find_next_entry_roundrobin(xps, cur); +++ if (xprt != NULL) +++ return xprt; +++ +++ return enfs_lb_switch_get_main_xprt(xps); +++} +++ +++static +++struct rpc_xprt *enfs_lb_iter_next_entry_roundrobin(struct rpc_xprt_iter *xpi) +++{ +++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); +++ +++ if (xps == NULL) +++ return NULL; +++ +++ return enfs_lb_set_cursor_xprt(xps, &xpi->xpi_cursor, +++ enfs_lb_switch_get_next_xprt_roundrobin); +++} +++ +++static +++struct rpc_xprt *enfs_lb_switch_find_singular_entry( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *pos; +++ bool found = false; +++ +++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { +++ if (cur == pos) +++ found = true; +++ +++ if (found && enfs_xprt_is_active(pos)) +++ return pos; +++ } +++ return NULL; +++} +++ +++struct rpc_xprt *enfs_lb_get_singular_xprt( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *xprt; +++ +++ if (xps == NULL) +++ return NULL; +++ +++ // disable multipath +++ if (enfs_get_config_multipath_state()) +++ return enfs_lb_switch_get_main_xprt(xps); +++ +++ if (cur == NULL || xps->xps_nxprts < 2) +++ return enfs_lb_switch_find_first_active_xprt(xps); +++ +++ xprt = enfs_lb_switch_find_singular_entry(xps, cur); +++ if (!xprt) +++ return enfs_lb_switch_get_main_xprt(xps); +++ +++ return xprt; +++} +++ +++static +++struct rpc_xprt *enfs_lb_iter_next_entry_sigular(struct rpc_xprt_iter *xpi) +++{ +++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); +++ +++ if (xps == NULL) +++ return NULL; +++ +++ return enfs_lb_set_cursor_xprt(xps, &xpi->xpi_cursor, +++ enfs_lb_get_singular_xprt); +++} +++ +++static void enfs_lb_iter_default_rewind(struct rpc_xprt_iter *xpi) +++{ +++ WRITE_ONCE(xpi->xpi_cursor, NULL); +++} +++ +++static void enfs_lb_switch_set_roundrobin(struct rpc_clnt *clnt) +++{ +++ struct rpc_xprt_switch *xps; +++ +++ rcu_read_lock(); +++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); +++ rcu_read_unlock(); +++ if (clnt->cl_vers == 3) { +++ +++ if (READ_ONCE(xps->xps_iter_ops) != &enfs_xprt_iter_roundrobin) +++ WRITE_ONCE(xps->xps_iter_ops, +++ &enfs_xprt_iter_roundrobin); +++ +++ return; +++ } +++ if (READ_ONCE(xps->xps_iter_ops) != &enfs_xprt_iter_singular) +++ WRITE_ONCE(xps->xps_iter_ops, &enfs_xprt_iter_singular); +++} +++ +++static +++struct rpc_xprt *enfs_lb_switch_find_current(struct list_head *head, +++ const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *pos; +++ +++ list_for_each_entry_rcu(pos, head, xprt_switch) { +++ if (cur == pos) +++ return pos; +++ } +++ return NULL; +++} +++ +++static struct rpc_xprt *enfs_lb_iter_current_entry(struct rpc_xprt_iter *xpi) +++{ +++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); +++ struct list_head *head; +++ +++ if (xps == NULL) +++ return NULL; +++ head = &xps->xps_xprt_list; +++ if (xpi->xpi_cursor == NULL || xps->xps_nxprts < 2) +++ return enfs_lb_switch_get_main_xprt(xps); +++ return enfs_lb_switch_find_current(head, xpi->xpi_cursor); +++} +++ +++void enfs_lb_set_policy(struct rpc_clnt *clnt) +++{ +++ enfs_lb_switch_set_roundrobin(clnt); +++} +++ +++static const struct rpc_xprt_iter_ops enfs_xprt_iter_roundrobin = { +++ .xpi_rewind = enfs_lb_iter_default_rewind, +++ .xpi_xprt = enfs_lb_iter_current_entry, +++ .xpi_next = enfs_lb_iter_next_entry_roundrobin, +++}; +++ +++static const struct rpc_xprt_iter_ops enfs_xprt_iter_singular = { +++ .xpi_rewind = enfs_lb_iter_default_rewind, +++ .xpi_xprt = enfs_lb_iter_current_entry, +++ .xpi_next = enfs_lb_iter_next_entry_sigular, +++}; ++diff --git a/fs/nfs/enfs/enfs_roundrobin.h b/fs/nfs/enfs/enfs_roundrobin.h ++new file mode 100644 ++index 000000000000..b72b088a6258 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_roundrobin.h ++@@ -0,0 +1,9 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++#ifndef ENFS_ROUNDROBIN_H +++#define ENFS_ROUNDROBIN_H +++ +++void enfs_lb_set_policy(struct rpc_clnt *clnt); +++#endif +diff --git a/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch b/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch +new file mode 100644 +index 0000000..cc6b677 +--- /dev/null ++++ b/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch +@@ -0,0 +1,1607 @@ ++diff --git a/fs/nfs/enfs/enfs_config.c b/fs/nfs/enfs/enfs_config.c ++new file mode 100644 ++index 000000000000..11aa7a00385b ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_config.c ++@@ -0,0 +1,378 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++#include <linux/cdev.h> +++#include <linux/errno.h> +++#include <linux/fcntl.h> +++#include <linux/fs.h> +++#include <linux/kernel.h> +++#include <linux/kthread.h> +++#include <linux/slab.h> +++#include <linux/string.h> +++#include <linux/uaccess.h> +++#include <linux/delay.h> +++ +++#include "enfs_errcode.h" +++#include "enfs_log.h" +++#include "enfs_config.h" +++ +++#define MAX_FILE_SIZE 8192 +++#define STRING_BUF_SIZE 128 +++#define CONFIG_FILE_PATH "/etc/enfs/config.ini" +++#define ENFS_NOTIFY_FILE_PERIOD 1000UL +++ +++#define MAX_PATH_DETECT_INTERVAL 300 +++#define MIN_PATH_DETECT_INTERVAL 5 +++#define MAX_PATH_DETECT_TIMEOUT 60 +++#define MIN_PATH_DETECT_TIMEOUT 1 +++#define MAX_MULTIPATH_TIMEOUT 60 +++#define MIN_MULTIPATH_TIMEOUT 0 +++#define MAX_MULTIPATH_STATE ENFS_MULTIPATH_DISABLE +++#define MIN_MULTIPATH_STATE ENFS_MULTIPATH_ENABLE +++ +++#define DEFAULT_PATH_DETECT_INTERVAL 10 +++#define DEFAULT_PATH_DETECT_TIMEOUT 5 +++#define DEFAULT_MULTIPATH_TIMEOUT 0 +++#define DEFAULT_MULTIPATH_STATE ENFS_MULTIPATH_ENABLE +++#define DEFAULT_LOADBALANCE_MODE ENFS_LOADBALANCE_RR +++ +++typedef int (*check_and_assign_func)(char *, char *, int, int); +++ +++struct enfs_config_info { +++ int32_t path_detect_interval; +++ int32_t path_detect_timeout; +++ int32_t multipath_timeout; +++ int32_t loadbalance_mode; +++ int32_t multipath_state; +++}; +++ +++struct check_and_assign_value { +++ char *field_name; +++ check_and_assign_func func; +++ int min_value; +++ int max_value; +++}; +++ +++static struct enfs_config_info g_enfs_config_info; +++static struct timespec64 modify_time; +++static struct task_struct *thread; +++ +++static int enfs_check_config_value(char *value, int min_value, int max_value) +++{ +++ unsigned long num_value; +++ int ret; +++ +++ ret = kstrtol(value, 10, &num_value); +++ if (ret != 0) { +++ enfs_log_error("Failed to convert string to int\n"); +++ return -EINVAL; +++ } +++ +++ if (num_value < min_value || num_value > max_value) +++ return -EINVAL; +++ +++ return num_value; +++} +++ +++static int32_t enfs_check_and_assign_int_value(char *field_name, char *value, +++ int min_value, int max_value) +++{ +++ int int_value = enfs_check_config_value(value, min_value, max_value); +++ +++ if (int_value < 0) +++ return -EINVAL; +++ +++ if (strcmp(field_name, "path_detect_interval") == 0) { +++ g_enfs_config_info.path_detect_interval = int_value; +++ return ENFS_RET_OK; +++ } +++ if (strcmp(field_name, "path_detect_timeout") == 0) { +++ g_enfs_config_info.path_detect_timeout = int_value; +++ return ENFS_RET_OK; +++ } +++ if (strcmp(field_name, "multipath_timeout") == 0) { +++ g_enfs_config_info.multipath_timeout = int_value; +++ return ENFS_RET_OK; +++ } +++ if (strcmp(field_name, "multipath_disable") == 0) { +++ g_enfs_config_info.multipath_state = int_value; +++ return ENFS_RET_OK; +++ } +++ return -EINVAL; +++} +++ +++static int32_t enfs_check_and_assign_loadbalance_mode(char *field_name, +++ char *value, +++ int min_value, +++ int max_value) +++{ +++ if (value == NULL) +++ return -EINVAL; +++ +++ if (strcmp(field_name, "multipath_select_policy") == 0) { +++ if (strcmp(value, "roundrobin") == 0) { +++ g_enfs_config_info.loadbalance_mode +++ = ENFS_LOADBALANCE_RR; +++ return ENFS_RET_OK; +++ } +++ } +++ return -EINVAL; +++} +++ +++static const struct check_and_assign_value g_check_and_assign_value[] = { +++ {"path_detect_interval", enfs_check_and_assign_int_value, +++ MIN_PATH_DETECT_INTERVAL, MAX_PATH_DETECT_INTERVAL}, +++ {"path_detect_timeout", enfs_check_and_assign_int_value, +++ MIN_PATH_DETECT_TIMEOUT, MAX_PATH_DETECT_TIMEOUT}, +++ {"multipath_timeout", enfs_check_and_assign_int_value, +++ MIN_MULTIPATH_TIMEOUT, MAX_MULTIPATH_TIMEOUT}, +++ {"multipath_disable", enfs_check_and_assign_int_value, +++ MIN_MULTIPATH_STATE, MAX_MULTIPATH_STATE}, +++ {"multipath_select_policy", enfs_check_and_assign_loadbalance_mode, +++ 0, 0}, +++}; +++ +++static int32_t enfs_read_config_file(char *buffer, char *file_path) +++{ +++ int ret; +++ struct file *filp = NULL; +++ loff_t f_pos = 0; +++ mm_segment_t fs; +++ +++ +++ filp = filp_open(file_path, O_RDONLY, 0); +++ +++ if (IS_ERR(filp)) { +++ enfs_log_error("Failed to open file %s\n", CONFIG_FILE_PATH); +++ ret = -ENOENT; +++ return ret; +++ } +++ +++ fs = get_fs(); +++ set_fs(get_ds()); +++ kernel_read(filp, buffer, MAX_FILE_SIZE, &f_pos); +++ set_fs(fs); +++ +++ ret = filp_close(filp, NULL); +++ if (ret) { +++ enfs_log_error("Close File:%s failed:%d.\n", +++ CONFIG_FILE_PATH, ret); +++ return -EINVAL; +++ } +++ return ENFS_RET_OK; +++} +++ +++static int32_t enfs_deal_with_comment_line(char *buffer) +++{ +++ int ret; +++ char *pos = strchr(buffer, '\n'); +++ +++ if (pos != NULL) +++ ret = strlen(buffer) - strlen(pos); +++ else +++ ret = strlen(buffer); +++ +++ return ret; +++} +++ +++static int32_t enfs_parse_key_value_from_config(char *buffer, char *key, +++ char *value, int keyLen, +++ int valueLen) +++{ +++ char *line; +++ char *tokenPtr; +++ int len; +++ char *tem; +++ char *pos = strchr(buffer, '\n'); +++ +++ if (pos != NULL) +++ len = strlen(buffer) - strlen(pos); +++ else +++ len = strlen(buffer); +++ +++ line = kmalloc(len + 1, GFP_KERNEL); +++ if (!line) { +++ enfs_log_error("Failed to allocate memory.\n"); +++ return -ENOMEM; +++ } +++ line[len] = '\0'; +++ strncpy(line, buffer, len); +++ +++ tem = line; +++ tokenPtr = strsep(&tem, "="); +++ if (tokenPtr == NULL || tem == NULL) { +++ kfree(line); +++ return len; +++ } +++ strncpy(key, strim(tokenPtr), keyLen); +++ strncpy(value, strim(tem), valueLen); +++ +++ kfree(line); +++ return len; +++} +++ +++static int32_t enfs_get_value_from_config_file(char *buffer, char *field_name, +++ char *value, int valueLen) +++{ +++ int ret; +++ char key[STRING_BUF_SIZE + 1] = {0}; +++ char val[STRING_BUF_SIZE + 1] = {0}; +++ +++ while (buffer[0] != '\0') { +++ if (buffer[0] == '\n') { +++ buffer++; +++ } else if (buffer[0] == '#') { +++ ret = enfs_deal_with_comment_line(buffer); +++ if (ret > 0) +++ buffer += ret; +++ } else { +++ ret = enfs_parse_key_value_from_config(buffer, key, val, +++ STRING_BUF_SIZE, +++ STRING_BUF_SIZE); +++ if (ret < 0) { +++ enfs_log_error("failed parse key value, %d\n" +++ , ret); +++ return ret; +++ } +++ key[STRING_BUF_SIZE] = '\0'; +++ val[STRING_BUF_SIZE] = '\0'; +++ +++ buffer += ret; +++ +++ if (strcmp(field_name, key) == 0) { +++ strncpy(value, val, valueLen); +++ return ENFS_RET_OK; +++ } +++ } +++ } +++ enfs_log_error("can not find value which matched field_name: %s.\n", +++ field_name); +++ return -EINVAL; +++} +++ +++int32_t enfs_config_load(void) +++{ +++ char value[STRING_BUF_SIZE + 1]; +++ int ret; +++ int table_len; +++ int min; +++ int max; +++ int i; +++ char *buffer; +++ +++ buffer = kmalloc(MAX_FILE_SIZE, GFP_KERNEL); +++ if (!buffer) { +++ enfs_log_error("Failed to allocate memory.\n"); +++ return -ENOMEM; +++ } +++ memset(buffer, 0, MAX_FILE_SIZE); +++ +++ g_enfs_config_info.path_detect_interval = DEFAULT_PATH_DETECT_INTERVAL; +++ g_enfs_config_info.path_detect_timeout = DEFAULT_PATH_DETECT_TIMEOUT; +++ g_enfs_config_info.multipath_timeout = DEFAULT_MULTIPATH_TIMEOUT; +++ g_enfs_config_info.multipath_state = DEFAULT_MULTIPATH_STATE; +++ g_enfs_config_info.loadbalance_mode = DEFAULT_LOADBALANCE_MODE; +++ +++ table_len = sizeof(g_check_and_assign_value) / +++ sizeof(g_check_and_assign_value[0]); +++ +++ ret = enfs_read_config_file(buffer, CONFIG_FILE_PATH); +++ if (ret != 0) { +++ kfree(buffer); +++ return ret; +++ } +++ +++ for (i = 0; i < table_len; i++) { +++ ret = enfs_get_value_from_config_file(buffer, +++ g_check_and_assign_value[i].field_name, +++ value, STRING_BUF_SIZE); +++ if (ret < 0) +++ continue; +++ +++ value[STRING_BUF_SIZE] = '\0'; +++ min = g_check_and_assign_value[i].min_value; +++ max = g_check_and_assign_value[i].max_value; +++ if (g_check_and_assign_value[i].func != NULL) +++ (*g_check_and_assign_value[i].func)( +++ g_check_and_assign_value[i].field_name, +++ value, min, max); +++ } +++ +++ kfree(buffer); +++ return ENFS_RET_OK; +++} +++ +++int32_t enfs_get_config_path_detect_interval(void) +++{ +++ return g_enfs_config_info.path_detect_interval; +++} +++ +++int32_t enfs_get_config_path_detect_timeout(void) +++{ +++ return g_enfs_config_info.path_detect_timeout; +++} +++ +++int32_t enfs_get_config_multipath_timeout(void) +++{ +++ return g_enfs_config_info.multipath_timeout; +++} +++ +++int32_t enfs_get_config_multipath_state(void) +++{ +++ return g_enfs_config_info.multipath_state; +++} +++ +++int32_t enfs_get_config_loadbalance_mode(void) +++{ +++ return g_enfs_config_info.loadbalance_mode; +++} +++ +++static bool enfs_file_changed(const char *filename) +++{ +++ int err; +++ struct kstat file_stat; +++ +++ err = vfs_stat(filename, &file_stat); +++ if (err) { +++ pr_err("failed to open file:%s err:%d\n", filename, err); +++ return false; +++ } +++ +++ if (timespec64_compare(&modify_time, &file_stat.mtime) == -1) { +++ modify_time = file_stat.mtime; +++ pr_info("file change: %lld %lld\n", modify_time.tv_sec, +++ file_stat.mtime.tv_sec); +++ return true; +++ } +++ +++ return false; +++} +++ +++static int enfs_thread_func(void *data) +++{ +++ while (!kthread_should_stop()) { +++ if (enfs_file_changed(CONFIG_FILE_PATH)) +++ enfs_config_load(); +++ +++ msleep(ENFS_NOTIFY_FILE_PERIOD); +++ } +++ return 0; +++} +++ +++int enfs_config_timer_init(void) +++{ +++ thread = kthread_run(enfs_thread_func, NULL, "enfs_notiy_file_thread"); +++ if (IS_ERR(thread)) { +++ pr_err("Failed to create kernel thread\n"); +++ return PTR_ERR(thread); +++ } +++ return 0; +++} +++ +++void enfs_config_timer_exit(void) +++{ +++ pr_info("enfs_notify_file_exit\n"); +++ if (thread) +++ kthread_stop(thread); +++} ++diff --git a/fs/nfs/enfs/enfs_config.h b/fs/nfs/enfs/enfs_config.h ++new file mode 100644 ++index 000000000000..752710129170 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_config.h ++@@ -0,0 +1,32 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs configuration +++ * Author: y00583252 +++ * Create: 2023-07-27 +++ */ +++ +++#ifndef ENFS_CONFIG_H +++#define ENFS_CONFIG_H +++ +++#include <linux/types.h> +++ +++enum enfs_multipath_state { +++ ENFS_MULTIPATH_ENABLE = 0, +++ ENFS_MULTIPATH_DISABLE = 1, +++}; +++ +++enum enfs_loadbalance_mode { +++ ENFS_LOADBALANCE_RR, +++}; +++ +++ +++int32_t enfs_get_config_path_detect_interval(void); +++int32_t enfs_get_config_path_detect_timeout(void); +++int32_t enfs_get_config_multipath_timeout(void); +++int32_t enfs_get_config_multipath_state(void); +++int32_t enfs_get_config_loadbalance_mode(void); +++int32_t enfs_config_load(void); +++int32_t enfs_config_timer_init(void); +++void enfs_config_timer_exit(void); +++#endif // ENFS_CONFIG_H ++diff --git a/fs/nfs/enfs/enfs_errcode.h b/fs/nfs/enfs/enfs_errcode.h ++new file mode 100644 ++index 000000000000..cca47ab9a191 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_errcode.h ++@@ -0,0 +1,17 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs errocode +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++ +++#ifndef ENFS_ERRCODE_H +++#define ENFS_ERRCODE_H +++ +++enum { +++ ENFS_RET_OK = 0, +++ ENFS_RET_FAIL +++}; +++ +++#endif // ENFS_ERRCODE_H ++diff --git a/fs/nfs/enfs/enfs_log.h b/fs/nfs/enfs/enfs_log.h ++new file mode 100644 ++index 000000000000..177b404f05df ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_log.h ++@@ -0,0 +1,25 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: enfs log +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++#ifndef ENFS_LOG_H +++#define ENFS_LOG_H +++ +++#include <linux/printk.h> +++ +++#define enfs_log_info(fmt, ...) \ +++ pr_info("enfs:[%s]" pr_fmt(fmt), \ +++ __func__, ##__VA_ARGS__) +++ +++#define enfs_log_error(fmt, ...) \ +++ pr_err("enfs:[%s]" pr_fmt(fmt), \ +++ __func__, ##__VA_ARGS__) +++ +++#define enfs_log_debug(fmt, ...) \ +++ pr_debug("enfs:[%s]" pr_fmt(fmt), \ +++ __func__, ##__VA_ARGS__) +++ +++#endif // ENFS_ERRCODE_H ++diff --git a/fs/nfs/enfs/failover_com.h b/fs/nfs/enfs/failover_com.h ++new file mode 100644 ++index 000000000000..c52940da232e ++--- /dev/null +++++ b/fs/nfs/enfs/failover_com.h ++@@ -0,0 +1,23 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: failover time commont header file +++ * Create: 2023-08-02 +++ */ +++#ifndef FAILOVER_COMMON_H +++#define FAILOVER_COMMON_H +++ +++static inline bool failover_is_enfs_clnt(struct rpc_clnt *clnt) +++{ +++ struct rpc_clnt *next = clnt->cl_parent; +++ +++ while (next) { +++ if (next == next->cl_parent) +++ break; +++ next = next->cl_parent; +++ } +++ +++ return next != NULL ? next->cl_enfs : clnt->cl_enfs; +++} +++ +++#endif // FAILOVER_COMMON_H ++diff --git a/fs/nfs/enfs/failover_path.c b/fs/nfs/enfs/failover_path.c ++new file mode 100644 ++index 000000000000..93b454de29d1 ++--- /dev/null +++++ b/fs/nfs/enfs/failover_path.c ++@@ -0,0 +1,207 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs path failover file +++ * Author: y00583252 +++ * Create: 2023-08-02 +++ */ +++ +++#include "failover_path.h" +++#include <linux/nfs.h> +++#include <linux/nfs3.h> +++#include <linux/nfs4.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/sched.h> +++#include <linux/sunrpc/xprt.h> +++#include "enfs_config.h" +++#include "enfs_log.h" +++#include "failover_com.h" +++#include "pm_state.h" +++#include "pm_ping.h" +++ +++enum failover_policy_t { +++ FAILOVER_NOACTION = 1, +++ FAILOVER_RETRY, +++ FAILOVER_RETRY_DELAY, +++}; +++ +++static void failover_retry_path(struct rpc_task *task) +++{ +++ xprt_release(task); +++ rpc_init_task_retry_counters(task); +++ rpc_task_release_transport(task); +++ rpc_restart_call(task); +++} +++ +++static void failover_retry_path_delay(struct rpc_task *task, int32_t delay) +++{ +++ failover_retry_path(task); +++ rpc_delay(task, delay); +++} +++ +++static void failover_retry_path_by_policy(struct rpc_task *task, +++ enum failover_policy_t policy) +++{ +++ if (policy == FAILOVER_RETRY) +++ failover_retry_path(task); +++ else if (policy == FAILOVER_RETRY_DELAY) +++ failover_retry_path_delay(task, 3 * HZ); // delay 3s +++} +++ +++static +++enum failover_policy_t failover_get_nfs3_retry_policy(struct rpc_task *task) +++{ +++ enum failover_policy_t policy = FAILOVER_NOACTION; +++ const struct rpc_procinfo *procinfo = task->tk_msg.rpc_proc; +++ u32 proc; +++ +++ if (unlikely(procinfo == NULL)) { +++ enfs_log_error("the task contains no valid proc.\n"); +++ return FAILOVER_NOACTION; +++ } +++ +++ proc = procinfo->p_proc; +++ +++ switch (proc) { +++ case NFS3PROC_CREATE: +++ case NFS3PROC_MKDIR: +++ case NFS3PROC_REMOVE: +++ case NFS3PROC_RMDIR: +++ case NFS3PROC_SYMLINK: +++ case NFS3PROC_LINK: +++ case NFS3PROC_SETATTR: +++ case NFS3PROC_WRITE: +++ policy = FAILOVER_RETRY_DELAY; +++ default: +++ policy = FAILOVER_RETRY; +++ } +++ return policy; +++} +++ +++static +++enum failover_policy_t failover_get_nfs4_retry_policy(struct rpc_task *task) +++{ +++ enum failover_policy_t policy = FAILOVER_NOACTION; +++ const struct rpc_procinfo *procinfo = task->tk_msg.rpc_proc; +++ u32 proc_idx; +++ +++ if (unlikely(procinfo == NULL)) { +++ enfs_log_error("the task contains no valid proc.\n"); +++ return FAILOVER_NOACTION; +++ } +++ +++ proc_idx = procinfo->p_statidx; +++ +++ switch (proc_idx) { +++ case NFSPROC4_CLNT_CREATE: +++ case NFSPROC4_CLNT_REMOVE: +++ case NFSPROC4_CLNT_LINK: +++ case NFSPROC4_CLNT_SYMLINK: +++ case NFSPROC4_CLNT_SETATTR: +++ case NFSPROC4_CLNT_WRITE: +++ case NFSPROC4_CLNT_RENAME: +++ case NFSPROC4_CLNT_SETACL: +++ policy = FAILOVER_RETRY_DELAY; +++ default: +++ policy = FAILOVER_RETRY; +++ } +++ return policy; +++} +++ +++static enum failover_policy_t failover_get_retry_policy(struct rpc_task *task) +++{ +++ struct rpc_clnt *clnt = task->tk_client; +++ u32 version = clnt->cl_vers; +++ enum failover_policy_t policy = FAILOVER_NOACTION; +++ +++ // 1. if the task meant to send to certain xprt, take no action +++ if (task->tk_flags & RPC_TASK_FIXED) +++ return FAILOVER_NOACTION; +++ +++ // 2. get policy by different version of nfs protocal +++ if (version == 3) // nfs v3 +++ policy = failover_get_nfs3_retry_policy(task); +++ else if (version == 4) // nfs v4 +++ policy = failover_get_nfs4_retry_policy(task); +++ else +++ return FAILOVER_NOACTION; +++ +++ // 3. if the task is not send to target, retry immediately +++ if (!RPC_WAS_SENT(task)) +++ policy = FAILOVER_RETRY; +++ +++ return policy; +++} +++ +++static int failover_check_task(struct rpc_task *task) +++{ +++ struct rpc_clnt *clnt = NULL; +++ int disable_mpath = enfs_get_config_multipath_state(); +++ +++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { +++ enfs_log_debug("Multipath is not enabled.\n"); +++ return -EINVAL; +++ } +++ +++ if (unlikely((task == NULL) || (task->tk_client == NULL))) { +++ enfs_log_error("The task is not valid.\n"); +++ return -EINVAL; +++ } +++ +++ clnt = task->tk_client; +++ +++ if (clnt->cl_prog != NFS_PROGRAM) { +++ enfs_log_debug("The clnt is not prog{%u} type.\n", +++ clnt->cl_prog); +++ return -EINVAL; +++ } +++ +++ if (!failover_is_enfs_clnt(clnt)) { +++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); +++ return -EINVAL; +++ } +++ return 0; +++} +++ +++void failover_handle(struct rpc_task *task) +++{ +++ enum failover_policy_t policy; +++ int ret; +++ +++ ret = failover_check_task(task); +++ if (ret != 0) +++ return; +++ +++ pm_set_path_state(task->tk_xprt, PM_STATE_FAULT); +++ +++ policy = failover_get_retry_policy(task); +++ +++ failover_retry_path_by_policy(task, policy); +++} +++ +++bool failover_task_need_call_start_again(struct rpc_task *task) +++{ +++ int ret; +++ +++ ret = failover_check_task(task); +++ if (ret != 0) +++ return false; +++ +++ return true; +++} +++ +++bool failover_prepare_transmit(struct rpc_task *task) +++{ +++ if (task->tk_flags & RPC_TASK_FIXED) +++ return true; +++ +++ if (pm_ping_is_test_xprt_task(task)) +++ return true; +++ +++ if (pm_get_path_state(task->tk_xprt) == PM_STATE_FAULT) { +++ task->tk_status = -ETIMEDOUT; +++ return false; +++ } +++ +++ return true; +++} ++diff --git a/fs/nfs/enfs/failover_path.h b/fs/nfs/enfs/failover_path.h ++new file mode 100644 ++index 000000000000..6f1294829a6e ++--- /dev/null +++++ b/fs/nfs/enfs/failover_path.h ++@@ -0,0 +1,17 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs path failover header file +++ * Author: y00583252 +++ * Create: 2023-08-02 +++ */ +++ +++#ifndef FAILOVER_PATH_H +++#define FAILOVER_PATH_H +++ +++#include <linux/sunrpc/sched.h> +++ +++void failover_handle(struct rpc_task *task); +++bool failover_prepare_transmit(struct rpc_task *task); +++ +++#endif // FAILOVER_PATH_H ++diff --git a/fs/nfs/enfs/failover_time.c b/fs/nfs/enfs/failover_time.c ++new file mode 100644 ++index 000000000000..866ea82d13fc ++--- /dev/null +++++ b/fs/nfs/enfs/failover_time.c ++@@ -0,0 +1,99 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: failover time file +++ * Create: 2023-08-02 +++ */ +++ +++#include "failover_time.h" +++#include <linux/jiffies.h> +++#include <linux/sunrpc/clnt.h> +++#include "enfs_config.h" +++#include "enfs_log.h" +++#include "failover_com.h" +++#include "pm_ping.h" +++ +++static unsigned long failover_get_mulitipath_timeout(struct rpc_clnt *clnt) +++{ +++ unsigned long config_tmo = enfs_get_config_multipath_timeout() * HZ; +++ unsigned long clnt_tmo = clnt->cl_timeout->to_initval; +++ +++ if (config_tmo == 0) +++ return clnt_tmo; +++ +++ return config_tmo > clnt_tmo ? clnt_tmo : config_tmo; +++} +++ +++void failover_adjust_task_timeout(struct rpc_task *task, void *condition) +++{ +++ struct rpc_clnt *clnt = NULL; +++ unsigned long tmo; +++ int disable_mpath = enfs_get_config_multipath_state(); +++ +++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { +++ enfs_log_debug("Multipath is not enabled.\n"); +++ return; +++ } +++ +++ clnt = task->tk_client; +++ if (unlikely(clnt == NULL)) { +++ enfs_log_error("task associate client is NULL.\n"); +++ return; +++ } +++ +++ if (!failover_is_enfs_clnt(clnt)) { +++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); +++ return; +++ } +++ +++ tmo = failover_get_mulitipath_timeout(clnt); +++ if (tmo == 0) { +++ enfs_log_debug("Multipath is not enabled.\n"); +++ return; +++ } +++ +++ if (task->tk_timeout != 0) +++ task->tk_timeout = +++ task->tk_timeout < tmo ? task->tk_timeout : tmo; +++ else +++ task->tk_timeout = tmo; +++} +++ +++void failover_init_task_req(struct rpc_task *task, struct rpc_rqst *req) +++{ +++ struct rpc_clnt *clnt = NULL; +++ int disable_mpath = enfs_get_config_multipath_state(); +++ +++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { +++ enfs_log_debug("Multipath is not enabled.\n"); +++ return; +++ } +++ +++ clnt = task->tk_client; +++ if (unlikely(clnt == NULL)) { +++ enfs_log_error("task associate client is NULL.\n"); +++ return; +++ } +++ +++ if (!failover_is_enfs_clnt(clnt)) { +++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); +++ return; +++ } +++ +++ if (!pm_ping_is_test_xprt_task(task)) +++ req->rq_timeout = failover_get_mulitipath_timeout(clnt); +++ else { +++ req->rq_timeout = enfs_get_config_path_detect_timeout() * HZ; +++ req->rq_majortimeo = req->rq_timeout + jiffies; +++ } +++ +++ /* +++ * when task is retried, the req is new, we lost major-timeout times, +++ * so we have to restore req major +++ * timeouts from the task, if it is stored. +++ */ +++ if (task->tk_major_timeo != 0) +++ req->rq_majortimeo = task->tk_major_timeo; +++ else +++ task->tk_major_timeo = req->rq_majortimeo; +++} ++diff --git a/fs/nfs/enfs/failover_time.h b/fs/nfs/enfs/failover_time.h ++new file mode 100644 ++index 000000000000..ede25b577a2a ++--- /dev/null +++++ b/fs/nfs/enfs/failover_time.h ++@@ -0,0 +1,16 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: failover time header file +++ * Create: 2023-08-02 +++ */ +++ +++#ifndef FAILOVER_TIME_H +++#define FAILOVER_TIME_H +++ +++#include <linux/sunrpc/sched.h> +++ +++void failover_adjust_task_timeout(struct rpc_task *task, void *condition); +++void failover_init_task_req(struct rpc_task *task, struct rpc_rqst *req); +++ +++#endif // FAILOVER_TIME_H ++diff --git a/fs/nfs/enfs/init.h b/fs/nfs/enfs/init.h ++new file mode 100644 ++index 000000000000..fdabb9084e19 ++--- /dev/null +++++ b/fs/nfs/enfs/init.h ++@@ -0,0 +1,17 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs client init +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++ +++#ifndef ENFS_INIT_H +++#define ENFS_INIT_H +++ +++#include <linux/types.h> +++ +++int32_t enfs_init(void); +++void enfs_fini(void); +++ +++#endif ++diff --git a/fs/nfs/enfs/mgmt_init.c b/fs/nfs/enfs/mgmt_init.c ++new file mode 100644 ++index 000000000000..75a40c5e0f6c ++--- /dev/null +++++ b/fs/nfs/enfs/mgmt_init.c ++@@ -0,0 +1,22 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: mgmt component init +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++ +++#include "mgmt_init.h" +++#include <linux/printk.h> +++#include "enfs_errcode.h" +++#include "enfs_config.h" +++ +++int32_t mgmt_init(void) +++{ +++ return enfs_config_timer_init(); +++} +++ +++void mgmt_fini(void) +++{ +++ enfs_config_timer_exit(); +++} ++diff --git a/fs/nfs/enfs/mgmt_init.h b/fs/nfs/enfs/mgmt_init.h ++new file mode 100644 ++index 000000000000..aa78303b9f01 ++--- /dev/null +++++ b/fs/nfs/enfs/mgmt_init.h ++@@ -0,0 +1,18 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: mgmt component init +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++ +++#ifndef MGMT_INIT_H +++#define MGMT_INIT_H +++ +++#include <linux/types.h> +++ +++int32_t mgmt_init(void); +++void mgmt_fini(void); +++ +++ +++#endif // MGMT_INIT_H ++diff --git a/fs/nfs/enfs/pm_ping.c b/fs/nfs/enfs/pm_ping.c ++new file mode 100644 ++index 000000000000..24153cd4c7f3 ++--- /dev/null +++++ b/fs/nfs/enfs/pm_ping.c ++@@ -0,0 +1,421 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: path state header file +++ * Author: x00833432 +++ * Create: 2023-08-21 +++ */ +++ +++#include "pm_ping.h" +++#include <linux/err.h> +++#include <linux/spinlock.h> +++#include <linux/slab.h> +++#include <linux/module.h> +++#include <linux/printk.h> +++#include <linux/kthread.h> +++#include <linux/nfs.h> +++#include <linux/errno.h> +++#include <linux/rcupdate.h> +++#include <linux/workqueue.h> +++#include <net/netns/generic.h> +++#include <linux/atomic.h> +++#include <linux/sunrpc/clnt.h> +++ +++#include "../../../net/sunrpc/netns.h" +++#include "pm_state.h" +++#include "enfs.h" +++#include "enfs_log.h" +++#include "enfs_config.h" +++ +++#define SLEEP_INTERVAL 2 +++extern unsigned int sunrpc_net_id; +++ +++static struct task_struct *pm_ping_timer_thread; +++//protect pint_execute_workq +++static spinlock_t ping_execute_workq_lock; +++// timer for test xprt workqueue +++static struct workqueue_struct *ping_execute_workq; +++// count the ping xprt work on flight +++static atomic_t check_xprt_count; +++ +++struct ping_xprt_work { +++ struct rpc_xprt *xprt; // use this specific xprt +++ struct rpc_clnt *clnt; // use this specific rpc_client +++ struct work_struct ping_work; +++}; +++ +++struct pm_ping_async_callback { +++ void *data; +++ void (*func)(void *data); +++}; +++ +++// set xprt's enum pm_check_state +++void pm_ping_set_path_check_state(struct rpc_xprt *xprt, +++ enum pm_check_state state) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ +++ if (IS_ERR(xprt)) { +++ enfs_log_error("The xprt ptr is not exist.\n"); +++ return; +++ } +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return; +++ } +++ +++ xprt_get(xprt); +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (ctx == NULL) { +++ enfs_log_error("The xprt multipath ctx is not valid.\n"); +++ xprt_put(xprt); +++ return; +++ } +++ +++ atomic_set(&ctx->path_check_state, state); +++ xprt_put(xprt); +++} +++ +++// get xprt's enum pm_check_state +++static enum pm_check_state pm_ping_get_path_check_state(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ enum pm_check_state state; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return PM_CHECK_UNDEFINE; +++ } +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (ctx == NULL) { +++ enfs_log_error("The xprt multipath ctx is not valid.\n"); +++ return PM_CHECK_UNDEFINE; +++ } +++ +++ state = atomic_read(&ctx->path_check_state); +++ +++ return state; +++} +++ +++static void pm_ping_call_done_callback(void *data) +++{ +++ struct pm_ping_async_callback *callback_data = +++ (struct pm_ping_async_callback *)data; +++ +++ if (callback_data == NULL) +++ return; +++ +++ callback_data->func(callback_data->data); +++ +++ kfree(callback_data); +++} +++ +++// Default callback for async RPC calls +++static void pm_ping_call_done(struct rpc_task *task, void *data) +++{ +++ struct rpc_xprt *xprt = task->tk_xprt; +++ +++ atomic_dec(&check_xprt_count); +++ if (task->tk_status >= 0) +++ pm_set_path_state(xprt, PM_STATE_NORMAL); +++ else +++ pm_set_path_state(xprt, PM_STATE_FAULT); +++ +++ pm_ping_set_path_check_state(xprt, PM_CHECK_FINISH); +++ +++ pm_ping_call_done_callback(data); +++} +++ +++// register func to rpc_call_done +++static const struct rpc_call_ops pm_ping_set_status_ops = { +++ .rpc_call_done = pm_ping_call_done, +++}; +++ +++// execute work which in work_queue +++static void pm_ping_execute_work(struct work_struct *work) +++{ +++ int ret = 0; +++ +++ // get the work information +++ struct ping_xprt_work *work_info = +++ container_of(work, struct ping_xprt_work, ping_work); +++ +++ // if check state is pending +++ if (pm_ping_get_path_check_state(work_info->xprt) == PM_CHECK_WAITING) { +++ +++ pm_ping_set_path_check_state(work_info->xprt, +++ PM_CHECK_CHECKING); +++ +++ ret = rpc_clnt_test_xprt(work_info->clnt, +++ work_info->xprt, +++ &pm_ping_set_status_ops, +++ NULL, +++ RPC_TASK_ASYNC | RPC_TASK_FIXED); +++ +++ if (ret < 0) { +++ enfs_log_debug("ping xprt execute failed ,ret %d", ret); +++ +++ pm_ping_set_path_check_state(work_info->xprt, +++ PM_CHECK_FINISH); +++ +++ } else +++ atomic_inc(&check_xprt_count); +++ +++ } +++ +++ atomic_dec(&work_info->clnt->cl_count); +++ xprt_put(work_info->xprt); +++ kfree(work_info); +++ work_info = NULL; +++} +++ +++static bool pm_ping_workqueue_queue_work(struct work_struct *work) +++{ +++ bool ret = false; +++ +++ spin_lock(&ping_execute_workq_lock); +++ +++ if (ping_execute_workq != NULL) +++ ret = queue_work(ping_execute_workq, work); +++ +++ spin_unlock(&ping_execute_workq_lock); +++ return ret; +++} +++ +++// init test work and add this work to workqueue +++static int pm_ping_add_work(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, void *data) +++{ +++ struct ping_xprt_work *work_info; +++ bool ret = false; +++ +++ if (IS_ERR(xprt) || xprt == NULL) { +++ enfs_log_error("The xprt ptr is not exist.\n"); +++ return -EINVAL; +++ } +++ +++ if (IS_ERR(clnt) || clnt == NULL) { +++ enfs_log_error("The clnt ptr is not exist.\n"); +++ return -EINVAL; +++ } +++ +++ if (!xprt->multipath_context) { +++ enfs_log_error("multipath_context is null.\n"); +++ return -EINVAL; +++ } +++ +++ // check xprt pending status, if pending status equals Finish +++ // means this xprt can inster to work queue +++ if (pm_ping_get_path_check_state(xprt) == +++ PM_CHECK_FINISH || +++ pm_ping_get_path_check_state(xprt) == +++ PM_CHECK_INIT) { +++ +++ enfs_log_debug("find xprt pointer. %p\n", xprt); +++ work_info = kzalloc(sizeof(struct ping_xprt_work), GFP_ATOMIC); +++ if (work_info == NULL) +++ return -ENOMEM; +++ work_info->clnt = clnt; +++ atomic_inc(&clnt->cl_count); +++ work_info->xprt = xprt; +++ xprt_get(xprt); +++ INIT_WORK(&work_info->ping_work, pm_ping_execute_work); +++ pm_ping_set_path_check_state(xprt, PM_CHECK_WAITING); +++ +++ ret = pm_ping_workqueue_queue_work(&work_info->ping_work); +++ if (!ret) { +++ atomic_dec(&work_info->clnt->cl_count); +++ xprt_put(work_info->xprt); +++ kfree(work_info); +++ return -EINVAL; +++ } +++ } +++ return 0; +++} +++ +++// encapsulate pm_ping_add_work() +++static int pm_ping_execute_xprt_test(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, void *data) +++{ +++ pm_ping_add_work(clnt, xprt, NULL); +++ // return 0 for rpc_clnt_iterate_for_each_xprt(); +++ // because negative value will stop iterate all xprt +++ // and we need return negative value for debug +++ // Therefore, we need this function to iterate all xprt +++ return 0; +++} +++ +++// export to other module add ping work to workqueue +++int pm_ping_rpc_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) +++{ +++ int ret; +++ +++ ret = pm_ping_add_work(clnt, xprt, NULL); +++ return ret; +++} +++ +++// iterate xprt in the client +++static void pm_ping_loop_rpclnt(struct sunrpc_net *sn) +++{ +++ struct rpc_clnt *clnt; +++ +++ spin_lock(&sn->rpc_client_lock); +++ list_for_each_entry_rcu(clnt, &sn->all_clients, cl_clients) { +++ if (clnt->cl_enfs) { +++ enfs_log_debug("find rpc_clnt. %p\n", clnt); +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ pm_ping_execute_xprt_test, NULL); +++ } +++ } +++ spin_unlock(&sn->rpc_client_lock); +++} +++ +++// iterate each clnt in the sunrpc_net +++static void pm_ping_loop_sunrpc_net(void) +++{ +++ struct net *net; +++ struct sunrpc_net *sn; +++ +++ rcu_read_lock(); +++ for_each_net_rcu(net) { +++ sn = net_generic(net, sunrpc_net_id); +++ if (sn == NULL) +++ continue; +++ pm_ping_loop_rpclnt(sn); +++ } +++ rcu_read_unlock(); +++} +++ +++static int pm_ping_routine(void *data) +++{ +++ while (!kthread_should_stop()) { +++ // equale 0 means open multipath +++ if (enfs_get_config_multipath_state() == +++ ENFS_MULTIPATH_ENABLE) +++ pm_ping_loop_sunrpc_net(); +++ +++ msleep((unsigned int) +++ enfs_get_config_path_detect_interval() * 1000); +++ } +++ return 0; +++} +++ +++// start thread to cycly ping +++static int pm_ping_start(void) +++{ +++ pm_ping_timer_thread = +++ kthread_run(pm_ping_routine, NULL, "pm_ping_routine"); +++ if (IS_ERR(pm_ping_timer_thread)) { +++ enfs_log_error("Failed to create kernel thread\n"); +++ return PTR_ERR(pm_ping_timer_thread); +++ } +++ return 0; +++} +++ +++// initialize workqueue +++static int pm_ping_workqueue_init(void) +++{ +++ struct workqueue_struct *queue = NULL; +++ +++ queue = create_workqueue("pm_ping_workqueue"); +++ +++ if (queue == NULL) { +++ enfs_log_error("create workqueue failed.\n"); +++ return -ENOMEM; +++ } +++ +++ spin_lock(&ping_execute_workq_lock); +++ ping_execute_workq = queue; +++ spin_unlock(&ping_execute_workq_lock); +++ enfs_log_info("create workqueue succeeeded.\n"); +++ return 0; +++} +++ +++static void pm_ping_workqueue_fini(void) +++{ +++ struct workqueue_struct *queue = NULL; +++ +++ spin_lock(&ping_execute_workq_lock); +++ queue = ping_execute_workq; +++ ping_execute_workq = NULL; +++ spin_unlock(&ping_execute_workq_lock); +++ +++ enfs_log_info("delete work queue\n"); +++ +++ if (queue != NULL) { +++ flush_workqueue(queue); +++ destroy_workqueue(queue); +++ } +++} +++ +++// module exit func +++void pm_ping_fini(void) +++{ +++ if (pm_ping_timer_thread) +++ kthread_stop(pm_ping_timer_thread); +++ +++ pm_ping_workqueue_fini(); +++ +++ while (atomic_read(&check_xprt_count) != 0) +++ msleep(SLEEP_INTERVAL); +++} +++ +++// module init func +++int pm_ping_init(void) +++{ +++ int ret; +++ +++ atomic_set(&check_xprt_count, 0); +++ ret = pm_ping_workqueue_init(); +++ if (ret != 0) { +++ enfs_log_error("PM_PING Module loading failed.\n"); +++ return ret; +++ } +++ ret = pm_ping_start(); +++ if (ret != 0) { +++ enfs_log_error("PM_PING Module loading failed.\n"); +++ pm_ping_workqueue_fini(); +++ return ret; +++ } +++ +++ return ret; +++} +++ +++bool pm_ping_is_test_xprt_task(struct rpc_task *task) +++{ +++ return task->tk_ops == &pm_ping_set_status_ops ? true : false; +++} +++ +++int pm_ping_rpc_test_xprt_with_callback(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void (*func)(void *data), +++ void *data) +++{ +++ int ret; +++ +++ struct pm_ping_async_callback *callback_data = +++ kzalloc(sizeof(struct pm_ping_async_callback), GFP_KERNEL); +++ +++ if (callback_data == NULL) { +++ enfs_log_error("failed to mzalloc mem\n"); +++ return -ENOMEM; +++ } +++ +++ callback_data->data = data; +++ callback_data->func = func; +++ atomic_inc(&check_xprt_count); +++ ret = rpc_clnt_test_xprt(clnt, xprt, +++ &pm_ping_set_status_ops, +++ callback_data, +++ RPC_TASK_ASYNC | RPC_TASK_FIXED); +++ +++ if (ret < 0) { +++ enfs_log_debug("ping xprt execute failed ,ret %d", ret); +++ atomic_dec(&check_xprt_count); +++ } +++ +++ return ret; +++} ++diff --git a/fs/nfs/enfs/pm_ping.h b/fs/nfs/enfs/pm_ping.h ++new file mode 100644 ++index 000000000000..6bcb94bfc836 ++--- /dev/null +++++ b/fs/nfs/enfs/pm_ping.h ++@@ -0,0 +1,33 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs configuration +++ * Author: x00833432 +++ * Create: 2023-07-27 +++ */ +++ +++#ifndef PM_PING_H +++#define PM_PING_H +++ +++#include <linux/sunrpc/clnt.h> +++ +++enum pm_check_state { +++ PM_CHECK_INIT, // this xprt never been queued +++ PM_CHECK_WAITING, // this xprt waiting in the queue +++ PM_CHECK_CHECKING, // this xprt is testing +++ PM_CHECK_FINISH, // this xprt has been finished +++ PM_CHECK_UNDEFINE, // undefine multipath struct +++}; +++ +++int pm_ping_init(void); +++void pm_ping_fini(void); +++int pm_ping_rpc_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt); +++void pm_ping_set_path_check_state(struct rpc_xprt *xprt, +++ enum pm_check_state state); +++bool pm_ping_is_test_xprt_task(struct rpc_task *task); +++int pm_ping_rpc_test_xprt_with_callback(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void (*func)(void *data), +++ void *data); +++ +++#endif // PM_PING_H ++diff --git a/fs/nfs/enfs/pm_state.c b/fs/nfs/enfs/pm_state.c ++new file mode 100644 ++index 000000000000..220621a207a2 ++--- /dev/null +++++ b/fs/nfs/enfs/pm_state.c ++@@ -0,0 +1,158 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: path state file +++ * Author: y00583252 +++ * Create: 2023-08-12 +++ */ +++#include "pm_state.h" +++#include <linux/sunrpc/xprt.h> +++ +++#include "enfs.h" +++#include "enfs_log.h" +++ +++enum pm_path_state pm_get_path_state(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ enum pm_path_state state; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return PM_STATE_UNDEFINED; +++ } +++ +++ xprt_get(xprt); +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (ctx == NULL) { +++ enfs_log_error("The xprt multipath ctx is not valid.\n"); +++ xprt_put(xprt); +++ return PM_STATE_UNDEFINED; +++ } +++ +++ state = atomic_read(&ctx->path_state); +++ +++ xprt_put(xprt); +++ +++ return state; +++} +++ +++void pm_set_path_state(struct rpc_xprt *xprt, enum pm_path_state state) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ enum pm_path_state cur_state; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return; +++ } +++ +++ xprt_get(xprt); +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (ctx == NULL) { +++ enfs_log_error("The xprt multipath ctx is not valid.\n"); +++ xprt_put(xprt); +++ return; +++ } +++ +++ cur_state = atomic_read(&ctx->path_state); +++ if (cur_state == state) { +++ enfs_log_debug("The xprt is already {%d}.\n", state); +++ xprt_put(xprt); +++ return; +++ } +++ +++ atomic_set(&ctx->path_state, state); +++ enfs_log_info("The xprt {%p} path state change from {%d} to {%d}.\n", +++ xprt, cur_state, state); +++ +++ xprt_put(xprt); +++} +++ +++void pm_get_path_state_desc(struct rpc_xprt *xprt, char *buf, int len) +++{ +++ enum pm_path_state state; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return; +++ } +++ +++ if ((buf == NULL) || (len <= 0)) { +++ enfs_log_error("Buffer is not valid, len=%d.\n", len); +++ return; +++ } +++ +++ state = pm_get_path_state(xprt); +++ +++ switch (state) { +++ case PM_STATE_INIT: +++ (void)snprintf(buf, len, "Init"); +++ break; +++ case PM_STATE_NORMAL: +++ (void)snprintf(buf, len, "Normal"); +++ break; +++ case PM_STATE_FAULT: +++ (void)snprintf(buf, len, "Fault"); +++ break; +++ default: +++ (void)snprintf(buf, len, "Unknown"); +++ break; +++ } +++} +++ +++void pm_get_xprt_state_desc(struct rpc_xprt *xprt, char *buf, int len) +++{ +++ int i; +++ unsigned long state; +++ static unsigned long xprt_mask[] = { +++ XPRT_LOCKED, XPRT_CONNECTED, +++ XPRT_CONNECTING, XPRT_CLOSE_WAIT, +++ XPRT_BOUND, XPRT_BINDING, XPRT_CLOSING, +++ XPRT_CONGESTED}; +++ +++ static const char *const xprt_state_desc[] = { +++ "LOCKED", "CONNECTED", "CONNECTING", +++ "CLOSE_WAIT", "BOUND", "BINDING", +++ "CLOSING", "CONGESTED"}; +++ int pos = 0; +++ int ret = 0; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return; +++ } +++ +++ if ((buf == NULL) || (len <= 0)) { +++ enfs_log_error( +++ "Xprt state buffer is not valid, len=%d.\n", +++ len); +++ return; +++ } +++ +++ xprt_get(xprt); +++ state = READ_ONCE(xprt->state); +++ xprt_put(xprt); +++ +++ for (i = 0; i < ARRAY_SIZE(xprt_mask); ++i) { +++ if (pos >= len) +++ break; +++ +++ if (!test_bit(xprt_mask[i], &state)) +++ continue; +++ +++ if (pos == 0) +++ ret = snprintf(buf, len, "%s", xprt_state_desc[i]); +++ else +++ ret = snprintf(buf + pos, len - pos, "|%s", +++ xprt_state_desc[i]); +++ +++ if (ret < 0) { +++ enfs_log_error("format state failed, ret %d.\n", ret); +++ break; +++ } +++ +++ pos += ret; +++ } +++} ++diff --git a/fs/nfs/enfs/pm_state.h b/fs/nfs/enfs/pm_state.h ++new file mode 100644 ++index 000000000000..f5f52e5ab91d ++--- /dev/null +++++ b/fs/nfs/enfs/pm_state.h ++@@ -0,0 +1,28 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: path state header file +++ * Author: y00583252 +++ * Create: 2023-08-12 +++ */ +++ +++#ifndef PM_STATE_H +++#define PM_STATE_H +++ +++#include <linux/types.h> +++#include <linux/sunrpc/xprt.h> +++ +++enum pm_path_state { +++ PM_STATE_INIT, +++ PM_STATE_NORMAL, +++ PM_STATE_FAULT, +++ PM_STATE_UNDEFINED // xprt is not multipath xprt +++}; +++ +++void pm_set_path_state(struct rpc_xprt *xprt, enum pm_path_state state); +++enum pm_path_state pm_get_path_state(struct rpc_xprt *xprt); +++ +++void pm_get_path_state_desc(struct rpc_xprt *xprt, char *buf, int len); +++void pm_get_xprt_state_desc(struct rpc_xprt *xprt, char *buf, int len); +++ +++#endif // PM_STATE_H +diff --git a/0006-add_enfs_compile_option.patch b/0006-add_enfs_compile_option.patch +new file mode 100644 +index 0000000..ff3bc0e +--- /dev/null ++++ b/0006-add_enfs_compile_option.patch +@@ -0,0 +1,70 @@ ++diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig ++index b04256636d4b..ae53510c0627 100644 ++--- a/arch/arm64/configs/openeuler_defconfig +++++ b/arch/arm64/configs/openeuler_defconfig ++@@ -5344,6 +5344,7 @@ CONFIG_LOCKD=m ++ CONFIG_LOCKD_V4=y ++ CONFIG_NFS_ACL_SUPPORT=m ++ CONFIG_NFS_COMMON=y +++# CONFIG_ENFS is not set ++ CONFIG_SUNRPC=m ++ CONFIG_SUNRPC_GSS=m ++ CONFIG_SUNRPC_BACKCHANNEL=y ++diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig ++index 59baeb2973af..ccc317f7fdb2 100644 ++--- a/arch/x86/configs/openeuler_defconfig +++++ b/arch/x86/configs/openeuler_defconfig ++@@ -6825,6 +6825,7 @@ CONFIG_LOCKD=m ++ CONFIG_LOCKD_V4=y ++ CONFIG_NFS_ACL_SUPPORT=m ++ CONFIG_NFS_COMMON=y +++# CONFIG_ENFS is not set ++ CONFIG_SUNRPC=m ++ CONFIG_SUNRPC_GSS=m ++ CONFIG_SUNRPC_BACKCHANNEL=y ++diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig ++index e55f86713948..872c9b7671b1 100644 ++--- a/fs/nfs/Kconfig +++++ b/fs/nfs/Kconfig ++@@ -196,3 +196,14 @@ config NFS_DEBUG ++ depends on NFS_FS && SUNRPC_DEBUG ++ select CRC32 ++ default y +++ +++config ENFS +++ tristate "NFS client support for ENFS" +++ depends on NFS_FS +++ default n +++ help +++ This option enables support multipath of the NFS protocol +++ in the kernel's NFS client. +++ This feature will improve performance and reliability. +++ +++ If sure, say Y. ++diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile ++index c587e3c4c6a6..19d0ac2ba3b8 100644 ++--- a/fs/nfs/Makefile +++++ b/fs/nfs/Makefile ++@@ -12,6 +12,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \ ++ nfs-$(CONFIG_ROOT_NFS) += nfsroot.o ++ nfs-$(CONFIG_SYSCTL) += sysctl.o ++ nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o +++nfs-$(CONFIG_ENFS) += enfs_adapter.o ++ ++ obj-$(CONFIG_NFS_V2) += nfsv2.o ++ nfsv2-y := nfs2super.o proc.o nfs2xdr.o ++@@ -34,3 +35,5 @@ nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o ++ obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/ ++ obj-$(CONFIG_PNFS_BLOCK) += blocklayout/ ++ obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += flexfilelayout/ +++ +++obj-$(CONFIG_ENFS) += enfs/ ++diff --git a/net/sunrpc/Makefile b/net/sunrpc/Makefile ++index 090658c3da12..fe4e3b28c5d1 100644 ++--- a/net/sunrpc/Makefile +++++ b/net/sunrpc/Makefile ++@@ -19,3 +19,4 @@ sunrpc-$(CONFIG_SUNRPC_DEBUG) += debugfs.o ++ sunrpc-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel_rqst.o ++ sunrpc-$(CONFIG_PROC_FS) += stats.o ++ sunrpc-$(CONFIG_SYSCTL) += sysctl.o +++sunrpc-$(CONFIG_ENFS) += sunrpc_enfs_adapter.o +diff --git a/kernel.spec b/kernel.spec +index 3215446..e242c00 100644 +--- a/kernel.spec ++++ b/kernel.spec +@@ -60,6 +60,13 @@ Source9002: series.conf + Source9998: patches.tar.bz2 + %endif + ++Patch0001: 0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch ++Patch0002: 0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch ++Patch0003: 0003-add_enfs_module.patch ++Patch0004: 0004-add_enfs_module_for_sunrpc_multipatch.patch ++Patch0005: 0005-add_enfs_module_for_sunrpc_failover_and_configure.patch ++Patch0006: 0006-add_enfs_compile_option.patch ++ + #BuildRequires: + BuildRequires: module-init-tools, patch >= 2.5.4, bash >= 2.03, tar + BuildRequires: bzip2, xz, findutils, gzip, m4, perl, make >= 3.78, diffutils, gawk +@@ -256,6 +263,12 @@ Applypatches() + Applypatches series.conf %{_builddir}/kernel-%{version}/linux-%{KernelVer} + %endif + ++%patch0001 -p1 ++%patch0002 -p1 ++%patch0003 -p1 ++%patch0004 -p1 ++%patch0005 -p1 ++%patch0006 -p1 + touch .scmversion + + find . $ -name "*.orig" -o -name "*~" $ -exec rm -f {} \; >/dev/null +-- +2.25.0.windows.1 + diff --git a/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch b/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch new file mode 100644 index 0000000..38e57a9 --- /dev/null +++ b/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch @@ -0,0 +1,757 @@ +diff --git a/fs/nfs/client.c b/fs/nfs/client.c +index 7d02dc52209d..50820a8a684a 100644 +--- a/fs/nfs/client.c ++++ b/fs/nfs/client.c +@@ -48,7 +48,7 @@ + #include "callback.h" + #include "delegation.h" + #include "iostat.h" +-#include "internal.h" ++#include "enfs_adapter.h" + #include "fscache.h" + #include "pnfs.h" + #include "nfs.h" +@@ -255,6 +255,7 @@ void nfs_free_client(struct nfs_client *clp) + put_nfs_version(clp->cl_nfs_mod); + kfree(clp->cl_hostname); + kfree(clp->cl_acceptor); ++ nfs_free_multi_path_client(clp); + kfree(clp); + } + EXPORT_SYMBOL_GPL(nfs_free_client); +@@ -330,6 +331,9 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat + sap)) + continue; + ++ if (!nfs_multipath_client_match(clp, data)) ++ continue; ++ + refcount_inc(&clp->cl_count); + return clp; + } +@@ -512,6 +516,9 @@ int nfs_create_rpc_client(struct nfs_client *clp, + .program = &nfs_program, + .version = clp->rpc_ops->version, + .authflavor = flavor, ++#if IS_ENABLED(CONFIG_ENFS) ++ .multipath_option = cl_init->enfs_option, ++#endif + }; + + if (test_bit(NFS_CS_DISCRTRY, &clp->cl_flags)) +@@ -634,6 +641,13 @@ struct nfs_client *nfs_init_client(struct nfs_client *clp, + /* the client is already initialised */ + if (clp->cl_cons_state == NFS_CS_READY) + return clp; ++ error = nfs_create_multi_path_client(clp, cl_init); ++ if (error < 0) { ++ dprintk("%s: create failed.%d!\n", __func__, error); ++ nfs_put_client(clp); ++ clp = ERR_PTR(error); ++ return clp; ++ } + + /* + * Create a client RPC handle for doing FSSTAT with UNIX auth only +@@ -666,6 +680,9 @@ static int nfs_init_server(struct nfs_server *server, + .net = data->net, + .timeparms = &timeparms, + .init_flags = (1UL << NFS_CS_REUSEPORT), ++#if IS_ENABLED(CONFIG_ENFS) ++ .enfs_option = data->enfs_option, ++#endif + }; + struct nfs_client *clp; + int error; +diff --git a/fs/nfs/enfs_adapter.c b/fs/nfs/enfs_adapter.c +new file mode 100644 +index 000000000000..7f471f2072c4 +--- /dev/null ++++ b/fs/nfs/enfs_adapter.c +@@ -0,0 +1,230 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/types.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs3.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include <linux/sunrpc/sched.h> ++#include <linux/nfs_iostat.h> ++#include "enfs_adapter.h" ++#include "iostat.h" ++ ++struct enfs_adapter_ops __rcu *enfs_adapter; ++ ++int enfs_adapter_register(struct enfs_adapter_ops *ops) ++{ ++ struct enfs_adapter_ops *old; ++ ++ old = cmpxchg((struct enfs_adapter_ops **)&enfs_adapter, NULL, ops); ++ if (old == NULL || old == ops) ++ return 0; ++ pr_err("regist %s ops %p failed. old %p\n", __func__, ops, old); ++ return -EPERM; ++} ++EXPORT_SYMBOL_GPL(enfs_adapter_register); ++ ++int enfs_adapter_unregister(struct enfs_adapter_ops *ops) ++{ ++ struct enfs_adapter_ops *old; ++ ++ old = cmpxchg((struct enfs_adapter_ops **)&enfs_adapter, ops, NULL); ++ if (old == ops || old == NULL) ++ return 0; ++ pr_err("unregist %s ops %p failed. old %p\n", __func__, ops, old); ++ return -EPERM; ++} ++EXPORT_SYMBOL_GPL(enfs_adapter_unregister); ++ ++struct enfs_adapter_ops *nfs_multipath_router_get(void) ++{ ++ struct enfs_adapter_ops *ops; ++ ++ rcu_read_lock(); ++ ops = rcu_dereference(enfs_adapter); ++ if (ops == NULL) { ++ rcu_read_unlock(); ++ return NULL; ++ } ++ if (!try_module_get(ops->owner)) ++ ops = NULL; ++ rcu_read_unlock(); ++ return ops; ++} ++ ++void nfs_multipath_router_put(struct enfs_adapter_ops *ops) ++{ ++ if (ops) ++ module_put(ops->owner); ++} ++ ++bool is_valid_option(enum nfsmultipathoptions option) ++{ ++ if (option < REMOTEADDR || option >= INVALID_OPTION) { ++ pr_warn("%s: ENFS invalid option %d\n", __func__, option); ++ return false; ++ } ++ ++ return true; ++} ++ ++int enfs_parse_mount_options(enum nfsmultipathoptions option, char *str, ++ struct nfs_parsed_mount_data *mnt) ++{ ++ ++ //parseMultiPathOptions(getNfsMultiPathOpt(token), string, mnt); ++ ++ int rc; ++ struct enfs_adapter_ops *ops; ++ ++ ops = nfs_multipath_router_get(); ++ if ((ops == NULL) || (ops->parse_mount_options == NULL) || ++ !is_valid_option(option)) { ++ nfs_multipath_router_put(ops); ++ dfprintk(MOUNT, ++ "NFS: parsing nfs mount option enfs not load[%s]\n" ++ , __func__); ++ return -EOPNOTSUPP; ++ } ++ // nfs_multipath_parse_options ++ dfprintk(MOUNT, "NFS: parsing nfs mount option '%s' type: %d[%s]\n" ++ , str, option, __func__); ++ rc = ops->parse_mount_options(option, str, &mnt->enfs_option, mnt->net); ++ nfs_multipath_router_put(ops); ++ return rc; ++} ++ ++void enfs_free_mount_options(struct nfs_parsed_mount_data *data) ++{ ++ struct enfs_adapter_ops *ops; ++ ++ if (data->enfs_option == NULL) ++ return; ++ ++ ops = nfs_multipath_router_get(); ++ if ((ops == NULL) || (ops->free_mount_options == NULL)) { ++ nfs_multipath_router_put(ops); ++ return; ++ } ++ ops->free_mount_options((void *)&data->enfs_option); ++ nfs_multipath_router_put(ops); ++} ++ ++int nfs_create_multi_path_client(struct nfs_client *client, ++ const struct nfs_client_initdata *cl_init) ++{ ++ int ret = 0; ++ struct enfs_adapter_ops *ops; ++ ++ if (cl_init->enfs_option == NULL) ++ return 0; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->client_info_init != NULL) ++ ret = ops->client_info_init( ++ (void *)&client->cl_multipath_data, cl_init); ++ nfs_multipath_router_put(ops); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(nfs_create_multi_path_client); ++ ++void nfs_free_multi_path_client(struct nfs_client *clp) ++{ ++ struct enfs_adapter_ops *ops; ++ ++ if (clp->cl_multipath_data == NULL) ++ return; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->client_info_free != NULL) ++ ops->client_info_free(clp->cl_multipath_data); ++ nfs_multipath_router_put(ops); ++} ++ ++int nfs_multipath_client_match(struct nfs_client *clp, ++ const struct nfs_client_initdata *sap) ++{ ++ int ret = true; ++ struct enfs_adapter_ops *ops; ++ ++ pr_info("%s src %p dst %p\n.", __func__, ++ clp->cl_multipath_data, sap->enfs_option); ++ ++ if (clp->cl_multipath_data == NULL && sap->enfs_option == NULL) ++ return true; ++ ++ if ((clp->cl_multipath_data == NULL && sap->enfs_option) || ++ (clp->cl_multipath_data && sap->enfs_option == NULL)) { ++ pr_err("not match client src %p dst %p\n.", ++ clp->cl_multipath_data, sap->enfs_option); ++ return false; ++ } ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->client_info_match != NULL) ++ ret = ops->client_info_match(clp->cl_multipath_data, ++ sap->enfs_option); ++ nfs_multipath_router_put(ops); ++ ++ return ret; ++} ++ ++int nfs4_multipath_client_match(struct nfs_client *src, struct nfs_client *dst) ++{ ++ int ret = true; ++ struct enfs_adapter_ops *ops; ++ ++ if (src->cl_multipath_data == NULL && dst->cl_multipath_data == NULL) ++ return true; ++ ++ if (src->cl_multipath_data == NULL || dst->cl_multipath_data == NULL) ++ return false; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->nfs4_client_info_match != NULL) ++ ret = ops->nfs4_client_info_match(src->cl_multipath_data, ++ src->cl_multipath_data); ++ nfs_multipath_router_put(ops); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(nfs4_multipath_client_match); ++ ++void nfs_multipath_show_client_info(struct seq_file *mount_option, ++ struct nfs_server *server) ++{ ++ struct enfs_adapter_ops *ops; ++ ++ if (mount_option == NULL || server == NULL || ++ server->client == NULL || ++ server->nfs_client->cl_multipath_data == NULL) ++ return; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->client_info_show != NULL) ++ ops->client_info_show(mount_option, server); ++ nfs_multipath_router_put(ops); ++} ++ ++int nfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option) ++{ ++ int ret = 0; ++ struct enfs_adapter_ops *ops; ++ ++ if (nfs_client == NULL || nfs_client->cl_rpcclient == NULL) ++ return 0; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->remount_ip_list != NULL) ++ ret = ops->remount_ip_list(nfs_client, enfs_option); ++ nfs_multipath_router_put(ops); ++ return ret; ++} ++EXPORT_SYMBOL_GPL(nfs_remount_iplist); +diff --git a/fs/nfs/enfs_adapter.h b/fs/nfs/enfs_adapter.h +new file mode 100644 +index 000000000000..752544e18056 +--- /dev/null ++++ b/fs/nfs/enfs_adapter.h +@@ -0,0 +1,101 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS adapt header. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _NFS_MULTIPATH_H_ ++#define _NFS_MULTIPATH_H_ ++ ++#include "internal.h" ++ ++#if IS_ENABLED(CONFIG_ENFS) ++enum nfsmultipathoptions { ++ REMOTEADDR, ++ LOCALADDR, ++ REMOTEDNSNAME, ++ REMOUNTREMOTEADDR, ++ REMOUNTLOCALADDR, ++ INVALID_OPTION ++}; ++ ++ ++struct enfs_adapter_ops { ++ const char *name; ++ struct module *owner; ++ int (*parse_mount_options)(enum nfsmultipathoptions option, ++ char *str, void **enfs_option, struct net *net_ns); ++ ++ void (*free_mount_options)(void **data); ++ ++ int (*client_info_init)(void **data, ++ const struct nfs_client_initdata *cl_init); ++ void (*client_info_free)(void *data); ++ int (*client_info_match)(void *src, void *dst); ++ int (*nfs4_client_info_match)(void *src, void *dst); ++ void (*client_info_show)(struct seq_file *mount_option, void *data); ++ int (*remount_ip_list)(struct nfs_client *nfs_client, ++ void *enfs_option); ++}; ++ ++int enfs_parse_mount_options(enum nfsmultipathoptions option, char *str, ++ struct nfs_parsed_mount_data *mnt); ++void enfs_free_mount_options(struct nfs_parsed_mount_data *data); ++int nfs_create_multi_path_client(struct nfs_client *client, ++ const struct nfs_client_initdata *cl_init); ++void nfs_free_multi_path_client(struct nfs_client *clp); ++int nfs_multipath_client_match(struct nfs_client *clp, ++ const struct nfs_client_initdata *sap); ++int nfs4_multipath_client_match(struct nfs_client *src, struct nfs_client *dst); ++void nfs_multipath_show_client_info(struct seq_file *mount_option, ++ struct nfs_server *server); ++int enfs_adapter_register(struct enfs_adapter_ops *ops); ++int enfs_adapter_unregister(struct enfs_adapter_ops *ops); ++int nfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option); ++int nfs4_create_multi_path(struct nfs_server *server, ++ struct nfs_parsed_mount_data *data, ++ const struct rpc_timeout *timeparms); ++ ++#else ++static inline ++void nfs_free_multi_path_client(struct nfs_client *clp) ++{ ++ ++} ++ ++static inline ++int nfs_multipath_client_match(struct nfs_client *clp, ++ const struct nfs_client_initdata *sap) ++{ ++ return 1; ++} ++ ++static inline ++int nfs_create_multi_path_client(struct nfs_client *client, ++ const struct nfs_client_initdata *cl_init) ++{ ++ return 0; ++} ++ ++static inline ++void nfs_multipath_show_client_info(struct seq_file *mount_option, ++ struct nfs_server *server) ++{ ++ ++} ++ ++static inline ++int nfs4_multipath_client_match(struct nfs_client *src, ++ struct nfs_client *dst) ++{ ++ return 1; ++} ++ ++static inline ++void enfs_free_mount_options(struct nfs_parsed_mount_data *data) ++{ ++ ++} ++ ++#endif // CONFIG_ENFS ++#endif // _NFS_MULTIPATH_H_ +diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h +index 0ce5a90640c4..c696693edc7b 100644 +--- a/fs/nfs/internal.h ++++ b/fs/nfs/internal.h +@@ -93,6 +93,9 @@ struct nfs_client_initdata { + u32 minorversion; + struct net *net; + const struct rpc_timeout *timeparms; ++#if IS_ENABLED(CONFIG_ENFS) ++ void *enfs_option; /* struct multipath_mount_options * */ ++#endif + }; + + /* +@@ -135,6 +138,9 @@ struct nfs_parsed_mount_data { + + struct security_mnt_opts lsm_opts; + struct net *net; ++#if IS_ENABLED(CONFIG_ENFS) ++ void *enfs_option; /* struct multipath_mount_options * */ ++#endif + }; + + /* mount_clnt.c */ +diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c +index 1350ea673672..4aa6e1f961f7 100644 +--- a/fs/nfs/nfs4client.c ++++ b/fs/nfs/nfs4client.c +@@ -10,7 +10,7 @@ + #include <linux/sunrpc/xprt.h> + #include <linux/sunrpc/bc_xprt.h> + #include <linux/sunrpc/rpc_pipe_fs.h> +-#include "internal.h" ++#include "enfs_adapter.h" + #include "callback.h" + #include "delegation.h" + #include "nfs4session.h" +@@ -225,6 +225,16 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init) + __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags); + __set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags); + ++#if IS_ENABLED(CONFIG_ENFS) ++ err = nfs_create_multi_path_client(clp, cl_init); ++ if (err < 0) { ++ dprintk("%s: create failed.%d\n", __func__, err); ++ nfs_put_client(clp); ++ clp = ERR_PTR(err); ++ return clp; ++ } ++#endif ++ + /* + * Set up the connection to the server before we add add to the + * global list. +@@ -529,6 +539,9 @@ static int nfs4_match_client(struct nfs_client *pos, struct nfs_client *new, + if (!nfs4_match_client_owner_id(pos, new)) + return 1; + ++ if (!nfs4_multipath_client_match(pos, new)) ++ return 1; ++ + return 0; + } + +@@ -860,7 +873,7 @@ static int nfs4_set_client(struct nfs_server *server, + const size_t addrlen, + const char *ip_addr, + int proto, const struct rpc_timeout *timeparms, +- u32 minorversion, struct net *net) ++ u32 minorversion, struct net *net, void *enfs_option) + { + struct nfs_client_initdata cl_init = { + .hostname = hostname, +@@ -872,6 +885,9 @@ static int nfs4_set_client(struct nfs_server *server, + .minorversion = minorversion, + .net = net, + .timeparms = timeparms, ++#if IS_ENABLED(CONFIG_ENFS) ++ .enfs_option = enfs_option, ++#endif + }; + struct nfs_client *clp; + +@@ -1042,6 +1058,30 @@ static int nfs4_server_common_setup(struct nfs_server *server, + return error; + } + ++int nfs4_create_multi_path(struct nfs_server *server, ++ struct nfs_parsed_mount_data *data, ++ const struct rpc_timeout *timeparms) ++{ ++ struct nfs_client_initdata cl_init = { ++ .hostname = data->nfs_server.hostname, ++ .addr = (const struct sockaddr *)&data->nfs_server.address, ++ .addrlen = data->nfs_server.addrlen, ++ .ip_addr = data->client_address, ++ .nfs_mod = &nfs_v4, ++ .proto = data->nfs_server.protocol, ++ .minorversion = data->minorversion, ++ .net = data->net, ++ .timeparms = timeparms, ++#if IS_ENABLED(CONFIG_ENFS) ++ .enfs_option = data->enfs_option, ++#endif // CONFIG_ENFS ++ }; ++ ++ return nfs_create_multi_path_client(server->nfs_client, &cl_init); ++ ++} ++EXPORT_SYMBOL_GPL(nfs4_create_multi_path); ++ + /* + * Create a version 4 volume record + */ +@@ -1050,6 +1090,7 @@ static int nfs4_init_server(struct nfs_server *server, + { + struct rpc_timeout timeparms; + int error; ++ void *enfs_option = NULL; + + nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, + data->timeo, data->retrans); +@@ -1067,6 +1108,10 @@ static int nfs4_init_server(struct nfs_server *server, + else + data->selected_flavor = RPC_AUTH_UNIX; + ++#if IS_ENABLED(CONFIG_ENFS) ++ enfs_option = data->enfs_option; ++#endif ++ + /* Get a client record */ + error = nfs4_set_client(server, + data->nfs_server.hostname, +@@ -1076,7 +1121,7 @@ static int nfs4_init_server(struct nfs_server *server, + data->nfs_server.protocol, + &timeparms, + data->minorversion, +- data->net); ++ data->net, enfs_option); + if (error < 0) + return error; + +@@ -1161,7 +1206,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, + XPRT_TRANSPORT_RDMA, + parent_server->client->cl_timeout, + parent_client->cl_mvops->minor_version, +- parent_client->cl_net); ++ parent_client->cl_net, NULL); + if (!error) + goto init_server; + #endif /* IS_ENABLED(CONFIG_SUNRPC_XPRT_RDMA) */ +@@ -1174,7 +1219,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, + XPRT_TRANSPORT_TCP, + parent_server->client->cl_timeout, + parent_client->cl_mvops->minor_version, +- parent_client->cl_net); ++ parent_client->cl_net, NULL); + if (error < 0) + goto error; + +@@ -1269,7 +1314,7 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname, + set_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status); + error = nfs4_set_client(server, hostname, sap, salen, buf, + clp->cl_proto, clnt->cl_timeout, +- clp->cl_minorversion, net); ++ clp->cl_minorversion, net, NULL); + clear_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status); + if (error != 0) { + nfs_server_insert_lists(server); +diff --git a/fs/nfs/super.c b/fs/nfs/super.c +index a05e1eb2c3fd..83cd294aca15 100644 +--- a/fs/nfs/super.c ++++ b/fs/nfs/super.c +@@ -61,7 +61,7 @@ + #include "callback.h" + #include "delegation.h" + #include "iostat.h" +-#include "internal.h" ++#include "enfs_adapter.h" + #include "fscache.h" + #include "nfs4session.h" + #include "pnfs.h" +@@ -113,6 +113,12 @@ enum { + + /* Special mount options */ + Opt_userspace, Opt_deprecated, Opt_sloppy, ++#if IS_ENABLED(CONFIG_ENFS) ++ Opt_remote_iplist, ++ Opt_local_iplist, ++ Opt_remote_dnslist, ++ Opt_enfs_info, ++#endif + + Opt_err + }; +@@ -183,6 +189,13 @@ static const match_table_t nfs_mount_option_tokens = { + { Opt_fscache_uniq, "fsc=%s" }, + { Opt_local_lock, "local_lock=%s" }, + ++#if IS_ENABLED(CONFIG_ENFS) ++ { Opt_remote_iplist, "remoteaddrs=%s" }, ++ { Opt_local_iplist, "localaddrs=%s" }, ++ { Opt_remote_dnslist, "remotednsname=%s" }, ++ { Opt_enfs_info, "enfs_info=%s" }, ++#endif ++ + /* The following needs to be listed after all other options */ + { Opt_nfsvers, "v%s" }, + +@@ -365,6 +378,21 @@ static struct shrinker acl_shrinker = { + .seeks = DEFAULT_SEEKS, + }; + ++#if IS_ENABLED(CONFIG_ENFS) ++enum nfsmultipathoptions getNfsMultiPathOpt(int token) ++{ ++ switch (token) { ++ case Opt_remote_iplist: ++ return REMOUNTREMOTEADDR; ++ case Opt_local_iplist: ++ return REMOUNTLOCALADDR; ++ case Opt_remote_dnslist: ++ return REMOTEDNSNAME; ++ } ++ return INVALID_OPTION; ++} ++#endif ++ + /* + * Register the NFS filesystems + */ +@@ -758,6 +786,9 @@ int nfs_show_options(struct seq_file *m, struct dentry *root) + seq_printf(m, ",addr=%s", + rpc_peeraddr2str(nfss->nfs_client->cl_rpcclient, + RPC_DISPLAY_ADDR)); ++ ++ nfs_multipath_show_client_info(m, nfss); ++ + rcu_read_unlock(); + + return 0; +@@ -853,6 +884,8 @@ int nfs_show_stats(struct seq_file *m, struct dentry *root) + seq_puts(m, root->d_sb->s_flags & SB_NODIRATIME ? ",nodiratime" : ""); + nfs_show_mount_options(m, nfss, 1); + ++ nfs_multipath_show_client_info(m, nfss); ++ + seq_printf(m, "\n\tage:\t%lu", (jiffies - nfss->mount_time) / HZ); + + show_implementation_id(m, nfss); +@@ -977,6 +1010,7 @@ static void nfs_free_parsed_mount_data(struct nfs_parsed_mount_data *data) + kfree(data->nfs_server.export_path); + kfree(data->nfs_server.hostname); + kfree(data->fscache_uniq); ++ enfs_free_mount_options(data); + security_free_mnt_opts(&data->lsm_opts); + kfree(data); + } +@@ -1641,7 +1675,34 @@ static int nfs_parse_mount_options(char *raw, + return 0; + }; + break; +- ++#if IS_ENABLED(CONFIG_ENFS) ++ case Opt_remote_iplist: ++ case Opt_local_iplist: ++ case Opt_remote_dnslist: ++ string = match_strdup(args); ++ if (string == NULL) ++ goto out_nomem; ++ rc = enfs_parse_mount_options(getNfsMultiPathOpt(token), ++ string, mnt); ++ kfree(string); ++ switch (rc) { ++ case 0: ++ break; ++ case -ENOMEM: ++ goto out_nomem; ++ case -ENOSPC: ++ goto out_limit; ++ case -EINVAL: ++ goto out_invalid_address; ++ case -ENOTSUPP: ++ goto out_invalid_address; ++ case -EOPNOTSUPP: ++ goto out_invalid_address; ++ } ++ break; ++ case Opt_enfs_info: ++ break; ++#endif + /* + * Special options + */ +@@ -1720,6 +1781,11 @@ static int nfs_parse_mount_options(char *raw, + free_secdata(secdata); + printk(KERN_INFO "NFS: security options invalid: %d\n", rc); + return 0; ++#if IS_ENABLED(CONFIG_ENFS) ++out_limit: ++ dprintk("NFS: param is more than supported limit: %d\n", rc); ++ return 0; ++#endif + } + + /* +@@ -2335,6 +2401,14 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) + if (!nfs_parse_mount_options((char *)options, data)) + goto out; + ++#if IS_ENABLED(CONFIG_ENFS) ++ if (data->enfs_option) { ++ error = nfs_remount_iplist(nfss->nfs_client, data->enfs_option); ++ if (error) ++ goto out; ++ } ++#endif ++ + /* + * noac is a special case. It implies -o sync, but that's not + * necessarily reflected in the mtab options. do_remount_sb +@@ -2347,6 +2421,11 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) + /* compare new mount options with old ones */ + error = nfs_compare_remount_data(nfss, data); + out: ++#if IS_ENABLED(CONFIG_ENFS) ++ /* release remount option member */ ++ if (data->enfs_option) ++ enfs_free_mount_options(data); ++#endif + nfs_free_parsed_mount_data(data); + return error; + } +diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h +index 7023ae64e3d7..2c19678afe8d 100644 +--- a/include/linux/nfs_fs_sb.h ++++ b/include/linux/nfs_fs_sb.h +@@ -123,6 +123,11 @@ struct nfs_client { + + struct net *cl_net; + struct list_head pending_cb_stateids; ++ ++#if IS_ENABLED(CONFIG_ENFS) ++ /* multi path private structure (struct multipath_client_info *) */ ++ void *cl_multipath_data; ++#endif + }; + + /* diff --git a/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch b/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch new file mode 100644 index 0000000..540a2ce --- /dev/null +++ b/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch @@ -0,0 +1,805 @@ +diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h +index 8aa865bce4f6..89178f78de8c 100644 +--- a/include/linux/sunrpc/clnt.h ++++ b/include/linux/sunrpc/clnt.h +@@ -70,6 +70,10 @@ struct rpc_clnt { + struct dentry *cl_debugfs; /* debugfs directory */ + #endif + struct rpc_xprt_iter cl_xpi; ++ ++#if IS_ENABLED(CONFIG_ENFS) ++ bool cl_enfs; ++#endif + }; + + /* +@@ -124,6 +128,9 @@ struct rpc_create_args { + unsigned long flags; + char *client_name; + struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */ ++#if IS_ENABLED(CONFIG_ENFS) ++ void *multipath_option; ++#endif + }; + + struct rpc_add_xprt_test { +@@ -221,6 +228,12 @@ bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt, + const struct sockaddr *sap); + void rpc_cleanup_clids(void); + ++#if IS_ENABLED(CONFIG_ENFS) ++int ++rpc_clnt_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt, ++ const struct rpc_call_ops *ops, void *data, int flags); ++#endif /* CONFIG_ENFS */ ++ + static inline int rpc_reply_expected(struct rpc_task *task) + { + return (task->tk_msg.rpc_proc != NULL) && +diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h +index ad2e243f3f03..124f5a0faf3e 100644 +--- a/include/linux/sunrpc/sched.h ++++ b/include/linux/sunrpc/sched.h +@@ -90,6 +90,9 @@ struct rpc_task { + tk_garb_retry : 2, + tk_cred_retry : 2, + tk_rebind_retry : 2; ++#if IS_ENABLED(CONFIG_ENFS) ++ unsigned long tk_major_timeo; /* major timeout ticks */ ++#endif + }; + + typedef void (*rpc_action)(struct rpc_task *); +@@ -118,6 +121,9 @@ struct rpc_task_setup { + */ + #define RPC_TASK_ASYNC 0x0001 /* is an async task */ + #define RPC_TASK_SWAPPER 0x0002 /* is swapping in/out */ ++#if IS_ENABLED(CONFIG_ENFS) ++#define RPC_TASK_FIXED 0x0004 /* detect xprt status task */ ++#endif + #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ + #define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ + #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ +@@ -257,6 +263,9 @@ void rpc_destroy_mempool(void); + extern struct workqueue_struct *rpciod_workqueue; + extern struct workqueue_struct *xprtiod_workqueue; + void rpc_prepare_task(struct rpc_task *task); ++#if IS_ENABLED(CONFIG_ENFS) ++void rpc_init_task_retry_counters(struct rpc_task *task); ++#endif + + static inline int rpc_wait_for_completion_task(struct rpc_task *task) + { +diff --git a/include/linux/sunrpc/sunrpc_enfs_adapter.h b/include/linux/sunrpc/sunrpc_enfs_adapter.h +new file mode 100644 +index 000000000000..28abedcf5cf6 +--- /dev/null ++++ b/include/linux/sunrpc/sunrpc_enfs_adapter.h +@@ -0,0 +1,128 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* Client-side SUNRPC ENFS adapter header. ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _SUNRPC_ENFS_ADAPTER_H_ ++#define _SUNRPC_ENFS_ADAPTER_H_ ++#include <linux/sunrpc/clnt.h> ++ ++#if IS_ENABLED(CONFIG_ENFS) ++ ++static inline void rpc_xps_nactive_add_one(struct rpc_xprt_switch *xps) ++{ ++ xps->xps_nactive--; ++} ++ ++static inline void rpc_xps_nactive_sub_one(struct rpc_xprt_switch *xps) ++{ ++ xps->xps_nactive--; ++} ++ ++struct rpc_xprt *rpc_task_get_xprt ++(struct rpc_clnt *clnt, struct rpc_xprt *xprt); ++ ++struct rpc_multipath_ops { ++ struct module *owner; ++ void (*create_clnt)(struct rpc_create_args *args, ++ struct rpc_clnt *clnt); ++ void (*releas_clnt)(struct rpc_clnt *clnt); ++ void (*create_xprt)(struct rpc_xprt *xprt); ++ void (*destroy_xprt)(struct rpc_xprt *xprt); ++ void (*xprt_iostat)(struct rpc_task *task); ++ void (*failover_handle)(struct rpc_task *task); ++ bool (*task_need_call_start_again)(struct rpc_task *task); ++ void (*adjust_task_timeout)(struct rpc_task *task, void *condition); ++ void (*init_task_req)(struct rpc_task *task, struct rpc_rqst *req); ++ bool (*prepare_transmit)(struct rpc_task *task); ++}; ++ ++extern struct rpc_multipath_ops __rcu *multipath_ops; ++void rpc_init_task_retry_counters(struct rpc_task *task); ++int rpc_multipath_ops_register(struct rpc_multipath_ops *ops); ++int rpc_multipath_ops_unregister(struct rpc_multipath_ops *ops); ++struct rpc_multipath_ops *rpc_multipath_ops_get(void); ++void rpc_multipath_ops_put(struct rpc_multipath_ops *ops); ++void rpc_task_release_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt); ++void rpc_multipath_ops_create_clnt(struct rpc_create_args *args, ++ struct rpc_clnt *clnt); ++void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt); ++bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt); ++void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt); ++void rpc_multipath_ops_xprt_iostat(struct rpc_task *task); ++void rpc_multipath_ops_failover_handle(struct rpc_task *task); ++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task); ++void rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, ++ void *condition); ++void rpc_multipath_ops_init_task_req(struct rpc_task *task, ++ struct rpc_rqst *req); ++bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task); ++ ++#else ++static inline struct rpc_xprt *rpc_task_get_xprt(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt) ++{ ++ return NULL; ++} ++ ++static inline void rpc_task_release_xprt(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt) ++{ ++} ++ ++static inline void rpc_xps_nactive_add_one(struct rpc_xprt_switch *xps) ++{ ++} ++ ++static inline void rpc_xps_nactive_sub_one(struct rpc_xprt_switch *xps) ++{ ++} ++ ++static inline void rpc_multipath_ops_create_clnt ++(struct rpc_create_args *args, struct rpc_clnt *clnt) ++{ ++} ++ ++static inline void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt) ++{ ++} ++ ++static inline bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt) ++{ ++ return false; ++} ++ ++static inline void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt) ++{ ++} ++ ++static inline void rpc_multipath_ops_xprt_iostat(struct rpc_task *task) ++{ ++} ++ ++static inline void rpc_multipath_ops_failover_handle(struct rpc_task *task) ++{ ++} ++ ++static inline ++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task) ++{ ++ return false; ++} ++ ++static inline void ++rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, void *condition) ++{ ++} ++ ++static inline void ++rpc_multipath_ops_init_task_req(struct rpc_task *task, struct rpc_rqst *req) ++{ ++} ++ ++static inline bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task) ++{ ++ return false; ++} ++ ++#endif ++#endif // _SUNRPC_ENFS_ADAPTER_H_ +diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h +index ccfacca1eba9..2e47b3577947 100644 +--- a/include/linux/sunrpc/xprt.h ++++ b/include/linux/sunrpc/xprt.h +@@ -279,6 +279,10 @@ struct rpc_xprt { + atomic_t inject_disconnect; + #endif + struct rcu_head rcu; ++#if IS_ENABLED(CONFIG_ENFS) ++ atomic_long_t queuelen; ++ void *multipath_context; ++#endif + }; + + #if defined(CONFIG_SUNRPC_BACKCHANNEL) +diff --git a/include/linux/sunrpc/xprtmultipath.h b/include/linux/sunrpc/xprtmultipath.h +index af1257c030d2..d54e4dbbbf34 100644 +--- a/include/linux/sunrpc/xprtmultipath.h ++++ b/include/linux/sunrpc/xprtmultipath.h +@@ -22,6 +22,10 @@ struct rpc_xprt_switch { + const struct rpc_xprt_iter_ops *xps_iter_ops; + + struct rcu_head xps_rcu; ++#if IS_ENABLED(CONFIG_ENFS) ++ unsigned int xps_nactive; ++ atomic_long_t xps_queuelen; ++#endif + }; + + struct rpc_xprt_iter { +@@ -69,4 +73,8 @@ extern struct rpc_xprt *xprt_iter_get_next(struct rpc_xprt_iter *xpi); + + extern bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps, + const struct sockaddr *sap); ++#if IS_ENABLED(CONFIG_ENFS) ++extern void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, ++ struct rpc_xprt *xprt); ++#endif + #endif +diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c +index 0fc540b0d183..d7ffee637148 100644 +--- a/net/sunrpc/clnt.c ++++ b/net/sunrpc/clnt.c +@@ -37,6 +37,7 @@ + #include <linux/sunrpc/rpc_pipe_fs.h> + #include <linux/sunrpc/metrics.h> + #include <linux/sunrpc/bc_xprt.h> ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> + #include <trace/events/sunrpc.h> + + #include "sunrpc.h" +@@ -490,6 +491,8 @@ static struct rpc_clnt *rpc_create_xprt(struct rpc_create_args *args, + } + } + ++ rpc_multipath_ops_create_clnt(args, clnt); ++ + clnt->cl_softrtry = 1; + if (args->flags & RPC_CLNT_CREATE_HARDRTRY) + clnt->cl_softrtry = 0; +@@ -869,6 +872,8 @@ void rpc_shutdown_client(struct rpc_clnt *clnt) + list_empty(&clnt->cl_tasks), 1*HZ); + } + ++ rpc_multipath_ops_releas_clnt(clnt); ++ + rpc_release_client(clnt); + } + EXPORT_SYMBOL_GPL(rpc_shutdown_client); +@@ -981,7 +986,13 @@ void rpc_task_release_transport(struct rpc_task *task) + + if (xprt) { + task->tk_xprt = NULL; +- xprt_put(xprt); ++#if IS_ENABLED(CONFIG_ENFS) ++ if (task->tk_client) { ++ rpc_task_release_xprt(task->tk_client, xprt); ++ return; ++ } ++#endif ++ xprt_put(xprt); + } + } + EXPORT_SYMBOL_GPL(rpc_task_release_transport); +@@ -990,6 +1001,10 @@ void rpc_task_release_client(struct rpc_task *task) + { + struct rpc_clnt *clnt = task->tk_client; + ++#if IS_ENABLED(CONFIG_ENFS) ++ rpc_task_release_transport(task); ++#endif ++ + if (clnt != NULL) { + /* Remove from client task list */ + spin_lock(&clnt->cl_lock); +@@ -999,14 +1014,29 @@ void rpc_task_release_client(struct rpc_task *task) + + rpc_release_client(clnt); + } ++#if IS_ENABLED(CONFIG_ENFS) ++#else + rpc_task_release_transport(task); ++#endif + } + ++#if IS_ENABLED(CONFIG_ENFS) ++static struct rpc_xprt * ++rpc_task_get_next_xprt(struct rpc_clnt *clnt) ++{ ++ return rpc_task_get_xprt(clnt, xprt_iter_get_next(&clnt->cl_xpi)); ++} ++#endif ++ + static + void rpc_task_set_transport(struct rpc_task *task, struct rpc_clnt *clnt) + { + if (!task->tk_xprt) ++#if IS_ENABLED(CONFIG_ENFS) ++ task->tk_xprt = rpc_task_get_next_xprt(clnt); ++#else + task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi); ++#endif + } + + static +@@ -1597,6 +1627,14 @@ call_reserveresult(struct rpc_task *task) + return; + case -EIO: /* probably a shutdown */ + break; ++#if IS_ENABLED(CONFIG_ENFS) ++ case -ETIMEDOUT: /* woken up; restart */ ++ if (rpc_multipath_ops_task_need_call_start_again(task)) { ++ rpc_task_release_transport(task); ++ task->tk_action = call_start; ++ return; ++ } ++#endif + default: + printk(KERN_ERR "%s: unrecognized error %d, exiting\n", + __func__, status); +@@ -1962,6 +2000,10 @@ call_transmit(struct rpc_task *task) + return; + if (!xprt_prepare_transmit(task)) + return; ++ ++ if (rpc_multipath_ops_prepare_transmit(task)) ++ return; ++ + task->tk_action = call_transmit_status; + /* Encode here so that rpcsec_gss can use correct sequence number. */ + if (rpc_task_need_encode(task)) { +@@ -2277,6 +2319,9 @@ call_timeout(struct rpc_task *task) + + retry: + task->tk_action = call_bind; ++#if IS_ENABLED(CONFIG_ENFS) ++ rpc_multipath_ops_failover_handle(task); ++#endif + task->tk_status = 0; + } + +@@ -2961,3 +3006,30 @@ rpc_clnt_swap_deactivate(struct rpc_clnt *clnt) + } + EXPORT_SYMBOL_GPL(rpc_clnt_swap_deactivate); + #endif /* CONFIG_SUNRPC_SWAP */ ++ ++#if IS_ENABLED(CONFIG_ENFS) ++/* rpc_clnt_test_xprt - Test and add a new transport to a rpc_clnt ++ * @clnt: pointer to struct rpc_clnt ++ * @xprt: pointer struct rpc_xprt ++ * @ops: async operation ++ */ ++int ++rpc_clnt_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt, ++ const struct rpc_call_ops *ops, void *data, int flags) ++{ ++ struct rpc_cred *cred; ++ struct rpc_task *task; ++ ++ cred = authnull_ops.lookup_cred(NULL, NULL, 0); ++ task = rpc_call_null_helper(clnt, xprt, cred, ++ RPC_TASK_SOFT | RPC_TASK_SOFTCONN | flags, ++ ops, data); ++ put_rpccred(cred); ++ if (IS_ERR(task)) ++ return PTR_ERR(task); ++ ++ rpc_put_task(task); ++ return 1; ++} ++EXPORT_SYMBOL_GPL(rpc_clnt_test_xprt); ++#endif +diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c +index a873c92a4898..2254fea0e863 100644 +--- a/net/sunrpc/sched.c ++++ b/net/sunrpc/sched.c +@@ -20,7 +20,7 @@ + #include <linux/mutex.h> + #include <linux/freezer.h> + +-#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> + + #include "sunrpc.h" + +@@ -962,7 +962,12 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta + /* Initialize workqueue for async tasks */ + task->tk_workqueue = task_setup_data->workqueue; + ++#if IS_ENABLED(CONFIG_ENFS) ++ task->tk_xprt = rpc_task_get_xprt(task_setup_data->rpc_client, ++ xprt_get(task_setup_data->rpc_xprt)); ++#else + task->tk_xprt = xprt_get(task_setup_data->rpc_xprt); ++#endif + + if (task->tk_ops->rpc_call_prepare != NULL) + task->tk_action = rpc_prepare_task; +diff --git a/net/sunrpc/sunrpc_enfs_adapter.c b/net/sunrpc/sunrpc_enfs_adapter.c +new file mode 100644 +index 000000000000..c1543545c6de +--- /dev/null ++++ b/net/sunrpc/sunrpc_enfs_adapter.c +@@ -0,0 +1,214 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* Client-side SUNRPC ENFS adapter header. ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ ++struct rpc_multipath_ops __rcu *multipath_ops; ++ ++void rpc_init_task_retry_counters(struct rpc_task *task) ++{ ++ /* Initialize retry counters */ ++ task->tk_garb_retry = 2; ++ task->tk_cred_retry = 2; ++ task->tk_rebind_retry = 2; ++} ++EXPORT_SYMBOL_GPL(rpc_init_task_retry_counters); ++ ++struct rpc_xprt * ++rpc_task_get_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) ++{ ++ struct rpc_xprt_switch *xps; ++ ++ if (!xprt) ++ return NULL; ++ rcu_read_lock(); ++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); ++ atomic_long_inc(&xps->xps_queuelen); ++ rcu_read_unlock(); ++ atomic_long_inc(&xprt->queuelen); ++ ++ return xprt; ++} ++ ++int rpc_multipath_ops_register(struct rpc_multipath_ops *ops) ++{ ++ struct rpc_multipath_ops *old; ++ ++ old = cmpxchg((struct rpc_multipath_ops **)&multipath_ops, NULL, ops); ++ if (!old || old == ops) ++ return 0; ++ pr_err("regist rpc_multipath ops %p fail. old %p\n", ops, old); ++ return -EPERM; ++} ++EXPORT_SYMBOL_GPL(rpc_multipath_ops_register); ++ ++int rpc_multipath_ops_unregister(struct rpc_multipath_ops *ops) ++{ ++ struct rpc_multipath_ops *old; ++ ++ old = cmpxchg((struct rpc_multipath_ops **)&multipath_ops, ops, NULL); ++ if (!old || old == ops) ++ return 0; ++ pr_err("regist rpc_multipath ops %p fail. old %p\n", ops, old); ++ return -EPERM; ++} ++EXPORT_SYMBOL_GPL(rpc_multipath_ops_unregister); ++ ++struct rpc_multipath_ops *rpc_multipath_ops_get(void) ++{ ++ struct rpc_multipath_ops *ops; ++ ++ rcu_read_lock(); ++ ops = rcu_dereference(multipath_ops); ++ if (!ops) { ++ rcu_read_unlock(); ++ return NULL; ++ } ++ if (!try_module_get(ops->owner)) ++ ops = NULL; ++ rcu_read_unlock(); ++ return ops; ++} ++EXPORT_SYMBOL_GPL(rpc_multipath_ops_get); ++ ++void rpc_multipath_ops_put(struct rpc_multipath_ops *ops) ++{ ++ if (ops) ++ module_put(ops->owner); ++} ++EXPORT_SYMBOL_GPL(rpc_multipath_ops_put); ++ ++void rpc_task_release_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) ++{ ++ struct rpc_xprt_switch *xps; ++ ++ atomic_long_dec(&xprt->queuelen); ++ rcu_read_lock(); ++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); ++ atomic_long_dec(&xps->xps_queuelen); ++ rcu_read_unlock(); ++ ++ xprt_put(xprt); ++} ++ ++void rpc_multipath_ops_create_clnt(struct rpc_create_args *args, ++ struct rpc_clnt *clnt) ++{ ++ struct rpc_multipath_ops *mops; ++ ++ if (args->multipath_option) { ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->create_clnt) ++ mops->create_clnt(args, clnt); ++ rpc_multipath_ops_put(mops); ++ } ++} ++ ++void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt) ++{ ++ struct rpc_multipath_ops *mops; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->releas_clnt) ++ mops->releas_clnt(clnt); ++ ++ rpc_multipath_ops_put(mops); ++} ++ ++bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt) ++{ ++ struct rpc_multipath_ops *mops = NULL; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->create_xprt) { ++ mops->create_xprt(xprt); ++ if (!xprt->multipath_context) { ++ rpc_multipath_ops_put(mops); ++ return true; ++ } ++ } ++ rpc_multipath_ops_put(mops); ++ return false; ++} ++ ++void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt) ++{ ++ struct rpc_multipath_ops *mops; ++ ++ if (xprt->multipath_context) { ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->destroy_xprt) ++ mops->destroy_xprt(xprt); ++ rpc_multipath_ops_put(mops); ++ } ++} ++ ++void rpc_multipath_ops_xprt_iostat(struct rpc_task *task) ++{ ++ struct rpc_multipath_ops *mops; ++ ++ mops = rpc_multipath_ops_get(); ++ if (task->tk_client && mops && mops->xprt_iostat) ++ mops->xprt_iostat(task); ++ rpc_multipath_ops_put(mops); ++} ++ ++void rpc_multipath_ops_failover_handle(struct rpc_task *task) ++{ ++ struct rpc_multipath_ops *mpath_ops = NULL; ++ ++ mpath_ops = rpc_multipath_ops_get(); ++ if (mpath_ops && mpath_ops->failover_handle) ++ mpath_ops->failover_handle(task); ++ rpc_multipath_ops_put(mpath_ops); ++} ++ ++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task) ++{ ++ struct rpc_multipath_ops *mpath_ops = NULL; ++ bool ret = false; ++ ++ mpath_ops = rpc_multipath_ops_get(); ++ if (mpath_ops && mpath_ops->task_need_call_start_again) ++ ret = mpath_ops->task_need_call_start_again(task); ++ rpc_multipath_ops_put(mpath_ops); ++ return ret; ++} ++ ++void rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, ++ void *condition) ++{ ++ struct rpc_multipath_ops *mops = NULL; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->adjust_task_timeout) ++ mops->adjust_task_timeout(task, NULL); ++ rpc_multipath_ops_put(mops); ++} ++ ++void rpc_multipath_ops_init_task_req(struct rpc_task *task, ++ struct rpc_rqst *req) ++{ ++ struct rpc_multipath_ops *mops = NULL; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->init_task_req) ++ mops->init_task_req(task, req); ++ rpc_multipath_ops_put(mops); ++} ++ ++bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task) ++{ ++ struct rpc_multipath_ops *mops = NULL; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->prepare_transmit) { ++ if (!(mops->prepare_transmit(task))) { ++ rpc_multipath_ops_put(mops); ++ return true; ++ } ++ } ++ rpc_multipath_ops_put(mops); ++ return false; ++} +diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c +index c912bf20faa2..c2b63b3d5217 100644 +--- a/net/sunrpc/xprt.c ++++ b/net/sunrpc/xprt.c +@@ -48,6 +48,7 @@ + #include <linux/sunrpc/clnt.h> + #include <linux/sunrpc/metrics.h> + #include <linux/sunrpc/bc_xprt.h> ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> + #include <linux/rcupdate.h> + + #include <trace/events/sunrpc.h> +@@ -259,6 +260,9 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) + dprintk("RPC: %5u failed to lock transport %p\n", + task->tk_pid, xprt); + task->tk_timeout = 0; ++ ++ rpc_multipath_ops_adjust_task_timeout(task, NULL); ++ + task->tk_status = -EAGAIN; + if (req == NULL) + priority = RPC_PRIORITY_LOW; +@@ -560,6 +564,9 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action) + struct rpc_xprt *xprt = req->rq_xprt; + + task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; ++ ++ rpc_multipath_ops_adjust_task_timeout(task, NULL); ++ + rpc_sleep_on(&xprt->pending, task, action); + } + EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); +@@ -1347,6 +1354,9 @@ xprt_request_init(struct rpc_task *task) + req->rq_rcv_buf.buflen = 0; + req->rq_release_snd_buf = NULL; + xprt_reset_majortimeo(req); ++ ++ rpc_multipath_ops_init_task_req(task, req); ++ + dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid, + req, ntohl(req->rq_xid)); + } +@@ -1427,6 +1437,9 @@ void xprt_release(struct rpc_task *task) + task->tk_ops->rpc_count_stats(task, task->tk_calldata); + else if (task->tk_client) + rpc_count_iostats(task, task->tk_client->cl_metrics); ++ ++ rpc_multipath_ops_xprt_iostat(task); ++ + spin_lock(&xprt->recv_lock); + if (!list_empty(&req->rq_list)) { + list_del_init(&req->rq_list); +@@ -1455,6 +1468,7 @@ void xprt_release(struct rpc_task *task) + else + xprt_free_bc_request(req); + } ++EXPORT_SYMBOL_GPL(xprt_release); + + static void xprt_init(struct rpc_xprt *xprt, struct net *net) + { +@@ -1528,6 +1542,10 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args) + return ERR_PTR(-ENOMEM); + } + ++if (rpc_multipath_ops_create_xprt(xprt)) { ++ xprt_destroy(xprt); ++ return ERR_PTR(-ENOMEM); ++} + rpc_xprt_debugfs_register(xprt); + + dprintk("RPC: created transport %p with %u slots\n", xprt, +@@ -1547,6 +1565,9 @@ static void xprt_destroy_cb(struct work_struct *work) + rpc_destroy_wait_queue(&xprt->sending); + rpc_destroy_wait_queue(&xprt->backlog); + kfree(xprt->servername); ++ ++ rpc_multipath_ops_destroy_xprt(xprt); ++ + /* + * Tear down transport state and free the rpc_xprt + */ +diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c +index 6ebaa58b4eff..6202a0be1327 100644 +--- a/net/sunrpc/xprtmultipath.c ++++ b/net/sunrpc/xprtmultipath.c +@@ -18,6 +18,7 @@ + #include <linux/sunrpc/xprt.h> + #include <linux/sunrpc/addr.h> + #include <linux/sunrpc/xprtmultipath.h> ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> + + typedef struct rpc_xprt *(*xprt_switch_find_xprt_t)(struct list_head *head, + const struct rpc_xprt *cur); +@@ -26,8 +27,8 @@ static const struct rpc_xprt_iter_ops rpc_xprt_iter_singular; + static const struct rpc_xprt_iter_ops rpc_xprt_iter_roundrobin; + static const struct rpc_xprt_iter_ops rpc_xprt_iter_listall; + +-static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, +- struct rpc_xprt *xprt) ++void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, ++ struct rpc_xprt *xprt) + { + if (unlikely(xprt_get(xprt) == NULL)) + return; +@@ -36,7 +37,9 @@ static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, + if (xps->xps_nxprts == 0) + xps->xps_net = xprt->xprt_net; + xps->xps_nxprts++; ++ rpc_xps_nactive_add_one(xps); + } ++EXPORT_SYMBOL(xprt_switch_add_xprt_locked); + + /** + * rpc_xprt_switch_add_xprt - Add a new rpc_xprt to an rpc_xprt_switch +@@ -63,6 +66,7 @@ static void xprt_switch_remove_xprt_locked(struct rpc_xprt_switch *xps, + if (unlikely(xprt == NULL)) + return; + xps->xps_nxprts--; ++ rpc_xps_nactive_sub_one(xps); + if (xps->xps_nxprts == 0) + xps->xps_net = NULL; + smp_wmb(); +@@ -84,7 +88,7 @@ void rpc_xprt_switch_remove_xprt(struct rpc_xprt_switch *xps, + spin_unlock(&xps->xps_lock); + xprt_put(xprt); + } +- ++EXPORT_SYMBOL(rpc_xprt_switch_remove_xprt); + /** + * xprt_switch_alloc - Allocate a new struct rpc_xprt_switch + * @xprt: pointer to struct rpc_xprt +@@ -102,7 +106,13 @@ struct rpc_xprt_switch *xprt_switch_alloc(struct rpc_xprt *xprt, + if (xps != NULL) { + spin_lock_init(&xps->xps_lock); + kref_init(&xps->xps_kref); ++#if IS_ENABLED(CONFIG_ENFS) ++ xps->xps_nxprts = 0; ++ xps->xps_nactive = 0; ++ atomic_long_set(&xps->xps_queuelen, 0); ++#else + xps->xps_nxprts = 0; ++#endif + INIT_LIST_HEAD(&xps->xps_xprt_list); + xps->xps_iter_ops = &rpc_xprt_iter_singular; + xprt_switch_add_xprt_locked(xps, xprt); +@@ -148,6 +158,7 @@ struct rpc_xprt_switch *xprt_switch_get(struct rpc_xprt_switch *xps) + return xps; + return NULL; + } ++EXPORT_SYMBOL(xprt_switch_get); + + /** + * xprt_switch_put - Release a reference to a rpc_xprt_switch +@@ -160,6 +171,7 @@ void xprt_switch_put(struct rpc_xprt_switch *xps) + if (xps != NULL) + kref_put(&xps->xps_kref, xprt_switch_free); + } ++EXPORT_SYMBOL(xprt_switch_put); + + /** + * rpc_xprt_switch_set_roundrobin - Set a round-robin policy on rpc_xprt_switch diff --git a/0003-add_enfs_module_for_nfs_mount_option.patch b/0003-add_enfs_module_for_nfs_mount_option.patch new file mode 100644 index 0000000..70753b5 --- /dev/null +++ b/0003-add_enfs_module_for_nfs_mount_option.patch @@ -0,0 +1,1209 @@ +diff --git a/fs/nfs/enfs/Makefile b/fs/nfs/enfs/Makefile +new file mode 100644 +index 000000000000..6e83eb23c668 +--- /dev/null ++++ b/fs/nfs/enfs/Makefile +@@ -0,0 +1,18 @@ ++obj-m += enfs.o ++ ++#EXTRA_CFLAGS += -I$(PWD)/.. ++ ++enfs-y := enfs_init.o ++enfs-y += enfs_config.o ++enfs-y += mgmt_init.o ++enfs-y += enfs_multipath_client.o ++enfs-y += enfs_multipath_parse.o ++enfs-y += failover_path.o ++enfs-y += failover_time.o ++enfs-y += enfs_roundrobin.o ++enfs-y += enfs_multipath.o ++enfs-y += enfs_path.o ++enfs-y += enfs_proc.o ++enfs-y += enfs_remount.o ++enfs-y += pm_ping.o ++enfs-y += pm_state.o +diff --git a/fs/nfs/enfs/enfs.h b/fs/nfs/enfs/enfs.h +new file mode 100644 +index 000000000000..be3d95220088 +--- /dev/null ++++ b/fs/nfs/enfs/enfs.h +@@ -0,0 +1,62 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS multipath adapt header. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++ ++#ifndef _ENFS_H_ ++#define _ENFS_H_ ++#include <linux/atomic.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs3.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include "../enfs_adapter.h" ++ ++#define IP_ADDRESS_LEN_MAX 64 ++#define MAX_IP_PAIR_PER_MOUNT 8 ++#define MAX_IP_INDEX (MAX_IP_PAIR_PER_MOUNT) ++#define MAX_SUPPORTED_LOCAL_IP_COUNT 8 ++#define MAX_SUPPORTED_REMOTE_IP_COUNT 32 ++#define MAX_DNS_NAME_LEN 512 ++#define MAX_DNS_SUPPORTED 2 ++#define EXTEND_CMD_MAX_BUF_LEN 65356 ++ ++ ++struct nfs_ip_list { ++ int count; ++ struct sockaddr_storage address[MAX_SUPPORTED_REMOTE_IP_COUNT]; ++ size_t addrlen[MAX_SUPPORTED_REMOTE_IP_COUNT]; ++}; ++ ++struct NFS_ROUTE_DNS_S { ++ char dnsname[MAX_DNS_NAME_LEN]; // valid only if dnsExist is true ++}; ++ ++struct NFS_ROUTE_DNS_INFO_S { ++ int dnsNameCount; // Count of DNS name in the list ++ // valid only if dnsExist is true ++ struct NFS_ROUTE_DNS_S routeRemoteDnsList[MAX_DNS_SUPPORTED]; ++}; ++ ++struct rpc_iostats; ++struct enfs_xprt_context { ++ struct sockaddr_storage srcaddr; ++ struct rpc_iostats *stats; ++ bool main; ++ atomic_t path_state; ++ atomic_t path_check_state; ++}; ++ ++static inline bool enfs_is_main_xprt(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx = xprt->multipath_context; ++ ++ if (!ctx) ++ return false; ++ return ctx->main; ++} ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_init.c b/fs/nfs/enfs/enfs_init.c +new file mode 100644 +index 000000000000..4b55608191a7 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_init.c +@@ -0,0 +1,98 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/module.h> ++#include <linux/sunrpc/sched.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs3.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include "enfs.h" ++#include "enfs_multipath_parse.h" ++#include "enfs_multipath_client.h" ++#include "enfs_remount.h" ++#include "init.h" ++#include "enfs_log.h" ++#include "enfs_multipath.h" ++#include "mgmt_init.h" ++ ++struct enfs_adapter_ops enfs_adapter = { ++ .name = "enfs", ++ .owner = THIS_MODULE, ++ .parse_mount_options = nfs_multipath_parse_options, ++ .free_mount_options = nfs_multipath_free_options, ++ .client_info_init = nfs_multipath_client_info_init, ++ .client_info_free = nfs_multipath_client_info_free, ++ .client_info_match = nfs_multipath_client_info_match, ++ .client_info_show = nfs_multipath_client_info_show, ++ .remount_ip_list = enfs_remount_iplist, ++}; ++ ++int32_t enfs_init(void) ++{ ++ int err; ++ ++ err = enfs_multipath_init(); ++ if (err) { ++ enfs_log_error("init multipath failed.\n"); ++ goto out; ++ } ++ ++ err = mgmt_init(); ++ if (err != 0) { ++ enfs_log_error("init mgmt failed.\n"); ++ goto out_tp_exit; ++ } ++ ++ return 0; ++ ++out_tp_exit: ++ enfs_multipath_exit(); ++out: ++ return err; ++} ++ ++void enfs_fini(void) ++{ ++ mgmt_fini(); ++ ++ enfs_multipath_exit(); ++} ++ ++static int __init init_enfs(void) ++{ ++ int ret; ++ ++ ret = enfs_adapter_register(&enfs_adapter); ++ if (ret) { ++ pr_err("regist enfs_adapter fail. ret %d\n", ret); ++ return -1; ++ } ++ ++ ret = enfs_init(); ++ if (ret) { ++ enfs_adapter_unregister(&enfs_adapter); ++ return -1; ++ } ++ ++ return 0; ++} ++ ++static void __exit exit_enfs(void) ++{ ++ enfs_fini(); ++ enfs_adapter_unregister(&enfs_adapter); ++} ++ ++MODULE_LICENSE("GPL"); ++MODULE_AUTHOR("Huawei Tech. Co., Ltd."); ++MODULE_DESCRIPTION("Nfs client router"); ++MODULE_VERSION("1.0"); ++ ++module_init(init_enfs); ++module_exit(exit_enfs); +diff --git a/fs/nfs/enfs/enfs_multipath_client.c b/fs/nfs/enfs/enfs_multipath_client.c +new file mode 100644 +index 000000000000..63c02898a42c +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_client.c +@@ -0,0 +1,340 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/types.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include <linux/proc_fs.h> ++#include <linux/seq_file.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/addr.h> ++#include "enfs_multipath_client.h" ++#include "enfs_multipath_parse.h" ++ ++int ++nfs_multipath_client_mount_info_init(struct multipath_client_info *client_info, ++ const struct nfs_client_initdata *client_init_data) ++{ ++ struct multipath_mount_options *mount_options = ++ (struct multipath_mount_options *)client_init_data->enfs_option; ++ ++ if (mount_options->local_ip_list) { ++ client_info->local_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ ++ if (!client_info->local_ip_list) ++ return -ENOMEM; ++ ++ memcpy(client_info->local_ip_list, mount_options->local_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (mount_options->remote_ip_list) { ++ ++ client_info->remote_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ ++ if (!client_info->remote_ip_list) { ++ kfree(client_info->local_ip_list); ++ client_info->local_ip_list = NULL; ++ return -ENOMEM; ++ } ++ memcpy(client_info->remote_ip_list, ++ mount_options->remote_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (mount_options->pRemoteDnsInfo) { ++ client_info->pRemoteDnsInfo = ++ kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), GFP_KERNEL); ++ ++ if (!client_info->pRemoteDnsInfo) { ++ kfree(client_info->local_ip_list); ++ client_info->local_ip_list = NULL; ++ kfree(client_info->remote_ip_list); ++ client_info->remote_ip_list = NULL; ++ return -ENOMEM; ++ } ++ memcpy(client_info->pRemoteDnsInfo, ++ mount_options->pRemoteDnsInfo, ++ sizeof(struct NFS_ROUTE_DNS_INFO_S)); ++ } ++ return 0; ++} ++ ++void nfs_multipath_client_info_free_work(struct work_struct *work) ++{ ++ ++ struct multipath_client_info *clp_info; ++ ++ if (work == NULL) ++ return; ++ ++ clp_info = container_of(work, struct multipath_client_info, work); ++ ++ if (clp_info->local_ip_list != NULL) { ++ kfree(clp_info->local_ip_list); ++ clp_info->local_ip_list = NULL; ++ } ++ if (clp_info->remote_ip_list != NULL) { ++ kfree(clp_info->remote_ip_list); ++ clp_info->remote_ip_list = NULL; ++ } ++ kfree(clp_info); ++} ++ ++void nfs_multipath_client_info_free(void *data) ++{ ++ struct multipath_client_info *clp_info = ++ (struct multipath_client_info *)data; ++ ++ if (clp_info == NULL) ++ return; ++ pr_info("free client info %p.\n", clp_info); ++ INIT_WORK(&clp_info->work, nfs_multipath_client_info_free_work); ++ schedule_work(&clp_info->work); ++} ++ ++int nfs_multipath_client_info_init(void **data, ++ const struct nfs_client_initdata *cl_init) ++{ ++ int rc; ++ struct multipath_client_info *info; ++ struct multipath_client_info **enfs_info; ++ /* no multi path info, no need do multipath init */ ++ if (cl_init->enfs_option == NULL) ++ return 0; ++ enfs_info = (struct multipath_client_info **)data; ++ if (enfs_info == NULL) ++ return -EINVAL; ++ ++ if (*enfs_info == NULL) ++ *enfs_info = kzalloc(sizeof(struct multipath_client_info), ++ GFP_KERNEL); ++ ++ if (*enfs_info == NULL) ++ return -ENOMEM; ++ ++ info = (struct multipath_client_info *)*enfs_info; ++ pr_info("init client info %p.\n", info); ++ rc = nfs_multipath_client_mount_info_init(info, cl_init); ++ if (rc) { ++ nfs_multipath_client_info_free((void *)info); ++ return rc; ++ } ++ return rc; ++} ++ ++bool nfs_multipath_ip_list_info_match(const struct nfs_ip_list *ip_list_src, ++ const struct nfs_ip_list *ip_list_dst) ++{ ++ int i; ++ int j; ++ bool is_find; ++ /* if both are equal or NULL, then return true. */ ++ if (ip_list_src == ip_list_dst) ++ return true; ++ ++ if ((ip_list_src == NULL || ip_list_dst == NULL)) ++ return false; ++ ++ if (ip_list_src->count != ip_list_dst->count) ++ return false; ++ ++ for (i = 0; i < ip_list_src->count; i++) { ++ is_find = false; ++ for (j = 0; j < ip_list_src->count; j++) { ++ if (rpc_cmp_addr_port( ++ (const struct sockaddr *) ++ &ip_list_src->address[i], ++ (const struct sockaddr *) ++ &ip_list_dst->address[j]) ++ ) { ++ is_find = true; ++ break; ++ } ++ } ++ if (is_find == false) ++ return false; ++ } ++ return true; ++} ++ ++int ++nfs_multipath_dns_list_info_match( ++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoSrc, ++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoDst) ++{ ++ int i; ++ ++ /* if both are equal or NULL, then return true. */ ++ if (pRemoteDnsInfoSrc == pRemoteDnsInfoDst) ++ return true; ++ ++ if ((pRemoteDnsInfoSrc == NULL || pRemoteDnsInfoDst == NULL)) ++ return false; ++ ++ if (pRemoteDnsInfoSrc->dnsNameCount != pRemoteDnsInfoDst->dnsNameCount) ++ return false; ++ ++ for (i = 0; i < pRemoteDnsInfoSrc->dnsNameCount; i++) { ++ if (!strcmp(pRemoteDnsInfoSrc->routeRemoteDnsList[i].dnsname, ++ pRemoteDnsInfoDst->routeRemoteDnsList[i].dnsname)) ++ return false; ++ } ++ return true; ++} ++ ++int nfs_multipath_client_info_match(void *src, void *dst) ++{ ++ int ret = true; ++ ++ struct multipath_client_info *src_info; ++ struct multipath_mount_options *dst_info; ++ ++ src_info = (struct multipath_client_info *)src; ++ dst_info = (struct multipath_mount_options *)dst; ++ pr_info("try match client .\n"); ++ ret = nfs_multipath_ip_list_info_match(src_info->local_ip_list, ++ dst_info->local_ip_list); ++ if (ret == false) { ++ pr_err("local_ip not match.\n"); ++ return ret; ++ } ++ ++ ret = nfs_multipath_ip_list_info_match(src_info->remote_ip_list, ++ dst_info->remote_ip_list); ++ if (ret == false) { ++ pr_err("remote_ip not match.\n"); ++ return ret; ++ } ++ ++ ret = nfs_multipath_dns_list_info_match(src_info->pRemoteDnsInfo, ++ dst_info->pRemoteDnsInfo); ++ if (ret == false) { ++ pr_err("dns not match.\n"); ++ return ret; ++ } ++ pr_info("try match client ret %d.\n", ret); ++ return ret; ++} ++ ++void nfs_multipath_print_ip_info(struct seq_file *mount_option, ++ struct nfs_ip_list *ip_list, ++ const char *type) ++{ ++ char buf[IP_ADDRESS_LEN_MAX + 1]; ++ int len = 0; ++ int i = 0; ++ ++ seq_printf(mount_option, ",%s=", type); ++ for (i = 0; i < ip_list->count; i++) { ++ len = rpc_ntop((struct sockaddr *)&ip_list->address[i], ++ buf, IP_ADDRESS_LEN_MAX); ++ if (len > 0 && len < IP_ADDRESS_LEN_MAX) ++ buf[len] = '\0'; ++ ++ if (i == 0) ++ seq_printf(mount_option, "%s", buf); ++ else ++ seq_printf(mount_option, "~%s", buf); ++ dfprintk(MOUNT, ++ "NFS: show nfs mount option type:%s %s [%s]\n", ++ type, buf, __func__); ++ } ++} ++ ++void nfs_multipath_print_dns_info(struct seq_file *mount_option, ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo, ++ const char *type) ++{ ++ int i = 0; ++ ++ seq_printf(mount_option, ",%s=", type); ++ for (i = 0; i < pRemoteDnsInfo->dnsNameCount; i++) { ++ if (i == 0) ++ seq_printf(mount_option, ++ "[%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ else if (i == pRemoteDnsInfo->dnsNameCount - 1) ++ seq_printf(mount_option, ",%s]", ++ pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ else ++ seq_printf(mount_option, ++ ",%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ } ++} ++ ++ ++static void multipath_print_sockaddr(struct seq_file *seq, ++ struct sockaddr *addr) ++{ ++ switch (addr->sa_family) { ++ case AF_INET: { ++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; ++ ++ seq_printf(seq, "%pI4", &sin->sin_addr); ++ return; ++ } ++ case AF_INET6: { ++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; ++ ++ seq_printf(seq, "%pI6", &sin6->sin6_addr); ++ return; ++ } ++ default: ++ break; ++ } ++ pr_err("unsupport family:%d\n", addr->sa_family); ++} ++ ++static void multipath_print_enfs_info(struct seq_file *seq, ++ struct nfs_server *server) ++{ ++ struct sockaddr_storage peeraddr; ++ struct rpc_clnt *next = server->client; ++ ++ rpc_peeraddr(server->client, ++ (struct sockaddr *)&peeraddr, sizeof(peeraddr)); ++ seq_puts(seq, ",enfs_info="); ++ multipath_print_sockaddr(seq, (struct sockaddr *)&peeraddr); ++ ++ while (next->cl_parent) { ++ if (next == next->cl_parent) ++ break; ++ next = next->cl_parent; ++ } ++ seq_printf(seq, "_%u", next->cl_clid); ++} ++ ++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data) ++{ ++ struct nfs_server *server = data; ++ struct multipath_client_info *client_info = ++ server->nfs_client->cl_multipath_data; ++ ++ dfprintk(MOUNT, "NFS: show nfs mount option[%s]\n", __func__); ++ if ((client_info->remote_ip_list) && ++ (client_info->remote_ip_list->count > 0)) ++ nfs_multipath_print_ip_info(mount_option, ++ client_info->remote_ip_list, ++ "remoteaddrs"); ++ ++ if ((client_info->local_ip_list) && ++ (client_info->local_ip_list->count > 0)) ++ nfs_multipath_print_ip_info(mount_option, ++ client_info->local_ip_list, ++ "localaddrs"); ++ ++ if ((client_info->pRemoteDnsInfo) && ++ (client_info->pRemoteDnsInfo->dnsNameCount > 0)) ++ nfs_multipath_print_dns_info(mount_option, ++ client_info->pRemoteDnsInfo, ++ "remotednsname"); ++ ++ multipath_print_enfs_info(mount_option, server); ++} +diff --git a/fs/nfs/enfs/enfs_multipath_client.h b/fs/nfs/enfs/enfs_multipath_client.h +new file mode 100644 +index 000000000000..208f7260690d +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_client.h +@@ -0,0 +1,26 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _ENFS_MULTIPATH_CLIENT_H_ ++#define _ENFS_MULTIPATH_CLIENT_H_ ++ ++#include "enfs.h" ++ ++struct multipath_client_info { ++ struct work_struct work; ++ struct nfs_ip_list *remote_ip_list; ++ struct nfs_ip_list *local_ip_list; ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; ++ s64 client_id; ++}; ++ ++int nfs_multipath_client_info_init(void **data, ++ const struct nfs_client_initdata *cl_init); ++void nfs_multipath_client_info_free(void *data); ++int nfs_multipath_client_info_match(void *src, void *dst); ++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data); ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_multipath_parse.c b/fs/nfs/enfs/enfs_multipath_parse.c +new file mode 100644 +index 000000000000..9c4c6c1880b6 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_parse.c +@@ -0,0 +1,601 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/types.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include <linux/parser.h> ++#include <linux/kern_levels.h> ++#include <linux/sunrpc/addr.h> ++#include "enfs_multipath_parse.h" ++#include "enfs_log.h" ++ ++#define NFSDBG_FACILITY NFSDBG_CLIENT ++ ++void nfs_multipath_parse_ip_ipv6_add(struct sockaddr_in6 *sin6, int add_num) ++{ ++ int i; ++ ++ pr_info("NFS: before %08x%08x%08x%08x add_num: %d[%s]\n", ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]), ++ add_num, __func__); ++ for (i = 0; i < add_num; i++) { ++ sin6->sin6_addr.in6_u.u6_addr32[3] = ++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]) + 1); ++ ++ if (sin6->sin6_addr.in6_u.u6_addr32[3] != 0) ++ continue; ++ ++ sin6->sin6_addr.in6_u.u6_addr32[2] = ++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]) + 1); ++ ++ if (sin6->sin6_addr.in6_u.u6_addr32[2] != 0) ++ continue; ++ ++ sin6->sin6_addr.in6_u.u6_addr32[1] = ++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]) + 1); ++ ++ if (sin6->sin6_addr.in6_u.u6_addr32[1] != 0) ++ continue; ++ ++ sin6->sin6_addr.in6_u.u6_addr32[0] = ++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]) + 1); ++ ++ if (sin6->sin6_addr.in6_u.u6_addr32[0] != 0) ++ continue; ++ } ++ ++ return; ++ ++} ++ ++static int nfs_multipath_parse_ip_range(struct net *net_ns, const char *cursor, ++ struct nfs_ip_list *ip_list, enum nfsmultipathoptions type) ++{ ++ struct sockaddr_storage addr; ++ struct sockaddr_storage tmp_addr; ++ int i; ++ size_t len; ++ int add_num = 1; ++ bool duplicate_flag = false; ++ bool is_complete = false; ++ struct sockaddr_in *sin4; ++ struct sockaddr_in6 *sin6; ++ ++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", ++ cursor, type, __func__); ++ len = rpc_pton(net_ns, cursor, strlen(cursor), ++ (struct sockaddr *)&addr, sizeof(addr)); ++ if (!len) ++ return -EINVAL; ++ ++ if (addr.ss_family != ip_list->address[ip_list->count - 1].ss_family) { ++ pr_info("NFS: %s parsing nfs mount option type: %d fail.\n", ++ __func__, type); ++ return -EINVAL; ++ } ++ ++ if (rpc_cmp_addr((const struct sockaddr *) ++ &ip_list->address[ip_list->count - 1], ++ (const struct sockaddr *)&addr)) { ++ ++ pr_info("range ip is same ip.\n"); ++ return 0; ++ ++ } ++ ++ while (true) { ++ ++ tmp_addr = ip_list->address[ip_list->count - 1]; ++ ++ switch (addr.ss_family) { ++ case AF_INET: ++ sin4 = (struct sockaddr_in *)&tmp_addr; ++ ++ sin4->sin_addr.s_addr = ++ htonl(ntohl(sin4->sin_addr.s_addr) + add_num); ++ ++ pr_info("NFS: mount option ip%08x type: %d ipcont %d [%s]\n", ++ ntohl(sin4->sin_addr.s_addr), ++ type, ip_list->count, __func__); ++ break; ++ case AF_INET6: ++ sin6 = (struct sockaddr_in6 *)&tmp_addr; ++ nfs_multipath_parse_ip_ipv6_add(sin6, add_num); ++ pr_info("NFS: mount option ip %08x%08x%08x%08x type: %d ipcont %d [%s]\n", ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]), ++ type, ip_list->count, __func__); ++ break; ++ // return -EOPNOTSUPP; ++ default: ++ return -EOPNOTSUPP; ++ } ++ ++ if (rpc_cmp_addr((const struct sockaddr *)&tmp_addr, ++ (const struct sockaddr *)&addr)) { ++ is_complete = true; ++ } ++ // delete duplicate ip, continuosly repeat, skip it ++ for (i = 0; i < ip_list->count; i++) { ++ duplicate_flag = false; ++ if (rpc_cmp_addr((const struct sockaddr *) ++ &ip_list->address[i], ++ (const struct sockaddr *)&tmp_addr)) { ++ add_num++; ++ duplicate_flag = true; ++ break; ++ } ++ } ++ ++ if (duplicate_flag == false) { ++ pr_info("this ip not duplicate;"); ++ add_num = 1; ++ // if not repeat but omit limit return false ++ if ((type == LOCALADDR && ++ ip_list->count >= MAX_SUPPORTED_LOCAL_IP_COUNT) || ++ (type == REMOTEADDR && ++ ip_list->count >= MAX_SUPPORTED_REMOTE_IP_COUNT)) { ++ ++ pr_info("[MULTIPATH:%s] iplist for type %d reached %d, more than supported limit %d\n", ++ __func__, type, ip_list->count, ++ type == LOCALADDR ? ++ MAX_SUPPORTED_LOCAL_IP_COUNT : ++ MAX_SUPPORTED_REMOTE_IP_COUNT); ++ ip_list->count = 0; ++ return -ENOSPC; ++ } ++ ip_list->address[ip_list->count] = tmp_addr; ++ ++ ip_list->addrlen[ip_list->count] = ++ ip_list->addrlen[ip_list->count - 1]; ++ ++ ip_list->count += 1; ++ } ++ if (is_complete == true) ++ break; ++ } ++ return 0; ++} ++ ++int nfs_multipath_parse_ip_list_inter(struct nfs_ip_list *ip_list, ++ struct net *net_ns, ++ char *cursor, enum nfsmultipathoptions type) ++{ ++ int i = 0; ++ struct sockaddr_storage addr; ++ struct sockaddr_storage swap; ++ int len; ++ ++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", ++ cursor, type, __func__); ++ ++ len = rpc_pton(net_ns, cursor, ++ strlen(cursor), ++ (struct sockaddr *)&addr, sizeof(addr)); ++ if (!len) ++ return -EINVAL; ++ ++ // check repeated ip ++ for (i = 0; i < ip_list->count; i++) { ++ if (rpc_cmp_addr((const struct sockaddr *) ++ &ip_list->address[i], ++ (const struct sockaddr *)&addr)) { ++ ++ pr_info("NFS: mount option '%s' type:%d index %d same as before index %d [%s]\n", ++ cursor, type, ip_list->count, i, __func__); ++ // prevent this ip is beginning ++ // if repeated take it to the end of list ++ swap = ip_list->address[i]; ++ ++ ip_list->address[i] = ++ ip_list->address[ip_list->count-1]; ++ ++ ip_list->address[ip_list->count-1] = swap; ++ return 0; ++ } ++ } ++ // if not repeated, check exceed limit ++ if ((type == LOCALADDR && ++ ip_list->count >= MAX_SUPPORTED_LOCAL_IP_COUNT) || ++ (type == REMOTEADDR && ++ ip_list->count >= MAX_SUPPORTED_REMOTE_IP_COUNT)) { ++ ++ pr_info("[MULTIPATH:%s] iplist for type %d reached %d, more than supported limit %d\n", ++ __func__, type, ip_list->count, ++ type == LOCALADDR ? ++ MAX_SUPPORTED_LOCAL_IP_COUNT : ++ MAX_SUPPORTED_REMOTE_IP_COUNT); ++ ++ ip_list->count = 0; ++ return -ENOSPC; ++ } ++ ip_list->address[ip_list->count] = addr; ++ ip_list->addrlen[ip_list->count] = len; ++ ip_list->count++; ++ ++ return 0; ++} ++ ++char *nfs_multipath_parse_ip_list_get_cursor(char **buf_to_parse, bool *single) ++{ ++ char *cursor = NULL; ++ const char *single_sep = strchr(*buf_to_parse, '~'); ++ const char *range_sep = strchr(*buf_to_parse, '-'); ++ ++ *single = true; ++ if (range_sep) { ++ if (range_sep > single_sep) { // A-B or A~B-C ++ if (single_sep == NULL) { // A-B ++ cursor = strsep(buf_to_parse, "-"); ++ if (cursor) ++ *single = false; ++ } else// A~B-C ++ cursor = strsep(buf_to_parse, "~"); ++ } else { // A-B~C ++ cursor = strsep(buf_to_parse, "-"); ++ if (cursor) ++ *single = false; ++ } ++ } else { // A~B~C ++ cursor = strsep(buf_to_parse, "~"); ++ } ++ return cursor; ++} ++ ++bool nfs_multipath_parse_param_check(enum nfsmultipathoptions type, ++ struct multipath_mount_options *options) ++{ ++ if (type == REMOUNTREMOTEADDR && options->remote_ip_list->count != 0) { ++ memset(options->remote_ip_list, 0, sizeof(struct nfs_ip_list)); ++ return true; ++ } ++ if (type == REMOUNTLOCALADDR && options->local_ip_list->count != 0) { ++ memset(options->local_ip_list, 0, sizeof(struct nfs_ip_list)); ++ return true; ++ } ++ if ((type == REMOTEADDR || type == REMOTEDNSNAME) && ++ options->pRemoteDnsInfo->dnsNameCount != 0) { ++ ++ pr_info("[MULTIPATH:%s] parse for %d ,already have dns\n", ++ __func__, type); ++ return false; ++ } else if ((type == REMOTEADDR || type == REMOTEDNSNAME) && ++ options->remote_ip_list->count != 0) { ++ ++ pr_info("[MULTIPATH:%s] parse for %d ,already have iplist\n", ++ __func__, type); ++ return false; ++ } ++ return true; ++} ++ ++int nfs_multipath_parse_ip_list(char *buffer, struct net *net_ns, ++ struct multipath_mount_options *options, ++ enum nfsmultipathoptions type) ++{ ++ char *buf_to_parse = NULL; ++ bool prev_range = false; ++ int ret = 0; ++ char *cursor = NULL; ++ bool single = true; ++ struct nfs_ip_list *ip_list_tmp = NULL; ++ ++ if (!nfs_multipath_parse_param_check(type, options)) ++ return -ENOTSUPP; ++ ++ if (type == REMOUNTREMOTEADDR) ++ type = REMOTEADDR; ++ ++ if (type == REMOUNTLOCALADDR) ++ type = LOCALADDR; ++ ++ if (type == LOCALADDR) ++ ip_list_tmp = options->local_ip_list; ++ else ++ ip_list_tmp = options->remote_ip_list; ++ ++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", ++ buffer, type, __func__); ++ ++ buf_to_parse = buffer; ++ while (buf_to_parse != NULL) { ++ cursor = ++ nfs_multipath_parse_ip_list_get_cursor(&buf_to_parse, &single); ++ if (!cursor) ++ break; ++ ++ if (single == false && prev_range == true) { ++ pr_info("NFS: mount option type: %d fail. Multiple Range.[%s]\n", ++ type, __func__); ++ ++ ret = -EINVAL; ++ goto out; ++ } ++ ++ if (prev_range == false) { ++ ret = nfs_multipath_parse_ip_list_inter(ip_list_tmp, ++ net_ns, cursor, type); ++ if (ret) ++ goto out; ++ if (single == false) ++ prev_range = true; ++ } else { ++ ret = nfs_multipath_parse_ip_range(net_ns, cursor, ++ ip_list_tmp, type); ++ if (ret != 0) ++ goto out; ++ prev_range = false; ++ } ++ } ++ ++out: ++ if (ret) ++ memset(ip_list_tmp, 0, sizeof(struct nfs_ip_list)); ++ ++ return ret; ++} ++ ++int nfs_multipath_parse_dns_list(char *buffer, struct net *net_ns, ++ struct multipath_mount_options *options) ++{ ++ struct NFS_ROUTE_DNS_INFO_S *dns_name_list_tmp = NULL; ++ char *cursor = NULL; ++ char *bufToParse; ++ ++ if (!nfs_multipath_parse_param_check(REMOTEDNSNAME, options)) ++ return -ENOTSUPP; ++ ++ pr_info("[MULTIPATH:%s] buffer %s\n", __func__, buffer); ++ // freed in nfs_free_parsed_mount_data ++ dns_name_list_tmp = kmalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), ++ GFP_KERNEL); ++ if (!dns_name_list_tmp) ++ return -ENOMEM; ++ ++ dns_name_list_tmp->dnsNameCount = 0; ++ bufToParse = buffer; ++ while (bufToParse) { ++ if (dns_name_list_tmp->dnsNameCount >= MAX_DNS_SUPPORTED) { ++ pr_err("%s: dnsname for %s reached %d,more than supported limit %d\n", ++ __func__, cursor, ++ dns_name_list_tmp->dnsNameCount, ++ MAX_DNS_SUPPORTED); ++ dns_name_list_tmp->dnsNameCount = 0; ++ return -ENOSPC; ++ } ++ cursor = strsep(&bufToParse, "~"); ++ if (!cursor) ++ break; ++ ++ strcpy(dns_name_list_tmp->routeRemoteDnsList ++ [dns_name_list_tmp->dnsNameCount].dnsname, ++ cursor); ++ dns_name_list_tmp->dnsNameCount++; ++ } ++ if (dns_name_list_tmp->dnsNameCount == 0) ++ return -EINVAL; ++ options->pRemoteDnsInfo = dns_name_list_tmp; ++ return 0; ++} ++ ++int nfs_multipath_parse_options_check_ipv4_valid(struct sockaddr_in *addr) ++{ ++ if (addr->sin_addr.s_addr == 0 || addr->sin_addr.s_addr == 0xffffffff) ++ return -EINVAL; ++ return 0; ++} ++ ++int nfs_multipath_parse_options_check_ipv6_valid(struct sockaddr_in6 *addr) ++{ ++ if (addr->sin6_addr.in6_u.u6_addr32[0] == 0 && ++ addr->sin6_addr.in6_u.u6_addr32[1] == 0 && ++ addr->sin6_addr.in6_u.u6_addr32[2] == 0 && ++ addr->sin6_addr.in6_u.u6_addr32[3] == 0) ++ return -EINVAL; ++ ++ if (addr->sin6_addr.in6_u.u6_addr32[0] == 0xffffffff && ++ addr->sin6_addr.in6_u.u6_addr32[1] == 0xffffffff && ++ addr->sin6_addr.in6_u.u6_addr32[2] == 0xffffffff && ++ addr->sin6_addr.in6_u.u6_addr32[3] == 0xffffffff) ++ return -EINVAL; ++ return 0; ++} ++ ++int nfs_multipath_parse_options_check_ip_valid(struct sockaddr_storage *address) ++{ ++ int rc = 0; ++ ++ if (address->ss_family == AF_INET) ++ rc = nfs_multipath_parse_options_check_ipv4_valid( ++ (struct sockaddr_in *)address); ++ else if (address->ss_family == AF_INET6) ++ rc = nfs_multipath_parse_options_check_ipv6_valid( ++ (struct sockaddr_in6 *)address); ++ else ++ rc = -EINVAL; ++ ++ return rc; ++} ++ ++int nfs_multipath_parse_options_check_valid( ++ struct multipath_mount_options *options) ++{ ++ int rc; ++ int i; ++ ++ if (options == NULL) ++ return 0; ++ ++ for (i = 0; i < options->local_ip_list->count; i++) { ++ rc = nfs_multipath_parse_options_check_ip_valid( ++ &options->local_ip_list->address[i]); ++ if (rc != 0) ++ return rc; ++ } ++ ++ for (i = 0; i < options->remote_ip_list->count; i++) { ++ rc = nfs_multipath_parse_options_check_ip_valid( ++ &options->remote_ip_list->address[i]); ++ if (rc != 0) ++ return rc; ++ } ++ ++ return 0; ++} ++int nfs_multipath_parse_options_check_duplicate( ++ struct multipath_mount_options *options) ++{ ++ int i; ++ int j; ++ ++ if (options == NULL || ++ options->local_ip_list->count == 0 || ++ options->remote_ip_list->count == 0) ++ ++ return 0; ++ ++ for (i = 0; i < options->local_ip_list->count; i++) { ++ for (j = 0; j < options->remote_ip_list->count; j++) { ++ if (rpc_cmp_addr((const struct sockaddr *) ++ &options->local_ip_list->address[i], ++ (const struct sockaddr *) ++ &options->remote_ip_list->address[j])) ++ return -ENOTSUPP; ++ } ++ } ++ return 0; ++} ++ ++int nfs_multipath_parse_options_check(struct multipath_mount_options *options) ++{ ++ int rc = 0; ++ ++ rc = nfs_multipath_parse_options_check_valid(options); ++ ++ if (rc != 0) { ++ pr_err("has invaild ip.\n"); ++ return rc; ++ } ++ ++ rc = nfs_multipath_parse_options_check_duplicate(options); ++ if (rc != 0) ++ return rc; ++ return rc; ++} ++ ++int nfs_multipath_alloc_options(void **enfs_option) ++{ ++ struct multipath_mount_options *options = NULL; ++ ++ options = kzalloc(sizeof(struct multipath_mount_options), GFP_KERNEL); ++ ++ if (options == NULL) ++ return -ENOMEM; ++ ++ options->local_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ if (options->local_ip_list == NULL) { ++ kfree(options); ++ return -ENOMEM; ++ } ++ ++ options->remote_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ if (options->remote_ip_list == NULL) { ++ kfree(options->local_ip_list); ++ kfree(options); ++ return -ENOMEM; ++ } ++ ++ options->pRemoteDnsInfo = kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), ++ GFP_KERNEL); ++ if (options->pRemoteDnsInfo == NULL) { ++ kfree(options->remote_ip_list); ++ kfree(options->local_ip_list); ++ kfree(options); ++ return -ENOMEM; ++ } ++ ++ *enfs_option = options; ++ return 0; ++} ++ ++int nfs_multipath_parse_options(enum nfsmultipathoptions type, ++ char *str, void **enfs_option, struct net *net_ns) ++{ ++ int rc; ++ struct multipath_mount_options *options = NULL; ++ ++ if ((str == NULL) || (enfs_option == NULL) || (net_ns == NULL)) ++ return -EINVAL; ++ ++ if (*enfs_option == NULL) { ++ rc = nfs_multipath_alloc_options(enfs_option); ++ if (rc != 0) { ++ enfs_log_error( ++ "alloc enfs_options failed! errno:%d\n", rc); ++ return rc; ++ } ++ } ++ ++ options = (struct multipath_mount_options *)*enfs_option; ++ ++ if (type == LOCALADDR || type == REMOUNTLOCALADDR || ++ type == REMOTEADDR || type == REMOUNTREMOTEADDR) { ++ rc = nfs_multipath_parse_ip_list(str, net_ns, options, type); ++ } else if (type == REMOTEDNSNAME) { ++ /* alloc and release need to modify */ ++ rc = nfs_multipath_parse_dns_list(str, net_ns, options); ++ } else { ++ rc = -EOPNOTSUPP; ++ } ++ ++ // after parsing cmd, need checking local and remote ++ // IP is same. if not means illegal cmd ++ if (rc == 0) ++ rc = nfs_multipath_parse_options_check_duplicate(options); ++ ++ if (rc == 0) ++ rc = nfs_multipath_parse_options_check(options); ++ ++ return rc; ++} ++ ++void nfs_multipath_free_options(void **enfs_option) ++{ ++ struct multipath_mount_options *options; ++ ++ if (enfs_option == NULL || *enfs_option == NULL) ++ return; ++ ++ options = (struct multipath_mount_options *)*enfs_option; ++ ++ if (options->remote_ip_list != NULL) { ++ kfree(options->remote_ip_list); ++ options->remote_ip_list = NULL; ++ } ++ ++ if (options->local_ip_list != NULL) { ++ kfree(options->local_ip_list); ++ options->local_ip_list = NULL; ++ } ++ ++ if (options->pRemoteDnsInfo != NULL) { ++ kfree(options->pRemoteDnsInfo); ++ options->pRemoteDnsInfo = NULL; ++ } ++ ++ kfree(options); ++ *enfs_option = NULL; ++} +diff --git a/fs/nfs/enfs/enfs_multipath_parse.h b/fs/nfs/enfs/enfs_multipath_parse.h +new file mode 100644 +index 000000000000..6f3e8703e3e2 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_parse.h +@@ -0,0 +1,22 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _ENFS_MULTIPATH_PARSE_H_ ++#define _ENFS_MULTIPATH_PARSE_H_ ++ ++#include "enfs.h" ++ ++struct multipath_mount_options { ++ struct nfs_ip_list *remote_ip_list; ++ struct nfs_ip_list *local_ip_list; ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; ++}; ++ ++int nfs_multipath_parse_options(enum nfsmultipathoptions type, ++ char *str, void **enfs_option, struct net *net_ns); ++void nfs_multipath_free_options(void **enfs_option); ++ ++#endif diff --git a/0004-add_enfs_module_for_sunrpc_multipatch.patch b/0004-add_enfs_module_for_sunrpc_multipatch.patch new file mode 100644 index 0000000..2c0fcc7 --- /dev/null +++ b/0004-add_enfs_module_for_sunrpc_multipatch.patch @@ -0,0 +1,1581 @@ +diff --git a/fs/nfs/enfs/enfs_multipath.h b/fs/nfs/enfs/enfs_multipath.h +new file mode 100644 +index 000000000000..e064c2929ced +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath.h +@@ -0,0 +1,24 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: enfs multipath ++ * Author: ++ * Create: 2023-07-31 ++ */ ++ ++#ifndef ENFS_MULTIPATH_H ++#define ENFS_MULTIPATH_H ++#include <linux/sunrpc/clnt.h> ++ ++#define MAX_XPRT_NUM_PER_CLIENT 32 ++ ++int enfs_multipath_init(void); ++void enfs_multipath_exit(void); ++void enfs_xprt_ippair_create(struct xprt_create *xprtargs, ++ struct rpc_clnt *clnt, void *data); ++int enfs_config_xprt_create_args(struct xprt_create *xprtargs, ++ struct rpc_create_args *args, ++ char *servername, size_t length); ++void print_enfs_multipath_addr(struct sockaddr *local, struct sockaddr *remote); ++ ++#endif // ENFS_MULTIPATH_H +diff --git a/fs/nfs/enfs/enfs_multipath_client.c b/fs/nfs/enfs/enfs_multipath_client.c +new file mode 100644 +index 000000000000..63c02898a42c +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_client.c +@@ -0,0 +1,340 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/types.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include <linux/proc_fs.h> ++#include <linux/seq_file.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/addr.h> ++#include "enfs_multipath_client.h" ++#include "enfs_multipath_parse.h" ++ ++int ++nfs_multipath_client_mount_info_init(struct multipath_client_info *client_info, ++ const struct nfs_client_initdata *client_init_data) ++{ ++ struct multipath_mount_options *mount_options = ++ (struct multipath_mount_options *)client_init_data->enfs_option; ++ ++ if (mount_options->local_ip_list) { ++ client_info->local_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ ++ if (!client_info->local_ip_list) ++ return -ENOMEM; ++ ++ memcpy(client_info->local_ip_list, mount_options->local_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (mount_options->remote_ip_list) { ++ ++ client_info->remote_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ ++ if (!client_info->remote_ip_list) { ++ kfree(client_info->local_ip_list); ++ client_info->local_ip_list = NULL; ++ return -ENOMEM; ++ } ++ memcpy(client_info->remote_ip_list, ++ mount_options->remote_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (mount_options->pRemoteDnsInfo) { ++ client_info->pRemoteDnsInfo = ++ kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), GFP_KERNEL); ++ ++ if (!client_info->pRemoteDnsInfo) { ++ kfree(client_info->local_ip_list); ++ client_info->local_ip_list = NULL; ++ kfree(client_info->remote_ip_list); ++ client_info->remote_ip_list = NULL; ++ return -ENOMEM; ++ } ++ memcpy(client_info->pRemoteDnsInfo, ++ mount_options->pRemoteDnsInfo, ++ sizeof(struct NFS_ROUTE_DNS_INFO_S)); ++ } ++ return 0; ++} ++ ++void nfs_multipath_client_info_free_work(struct work_struct *work) ++{ ++ ++ struct multipath_client_info *clp_info; ++ ++ if (work == NULL) ++ return; ++ ++ clp_info = container_of(work, struct multipath_client_info, work); ++ ++ if (clp_info->local_ip_list != NULL) { ++ kfree(clp_info->local_ip_list); ++ clp_info->local_ip_list = NULL; ++ } ++ if (clp_info->remote_ip_list != NULL) { ++ kfree(clp_info->remote_ip_list); ++ clp_info->remote_ip_list = NULL; ++ } ++ kfree(clp_info); ++} ++ ++void nfs_multipath_client_info_free(void *data) ++{ ++ struct multipath_client_info *clp_info = ++ (struct multipath_client_info *)data; ++ ++ if (clp_info == NULL) ++ return; ++ pr_info("free client info %p.\n", clp_info); ++ INIT_WORK(&clp_info->work, nfs_multipath_client_info_free_work); ++ schedule_work(&clp_info->work); ++} ++ ++int nfs_multipath_client_info_init(void **data, ++ const struct nfs_client_initdata *cl_init) ++{ ++ int rc; ++ struct multipath_client_info *info; ++ struct multipath_client_info **enfs_info; ++ /* no multi path info, no need do multipath init */ ++ if (cl_init->enfs_option == NULL) ++ return 0; ++ enfs_info = (struct multipath_client_info **)data; ++ if (enfs_info == NULL) ++ return -EINVAL; ++ ++ if (*enfs_info == NULL) ++ *enfs_info = kzalloc(sizeof(struct multipath_client_info), ++ GFP_KERNEL); ++ ++ if (*enfs_info == NULL) ++ return -ENOMEM; ++ ++ info = (struct multipath_client_info *)*enfs_info; ++ pr_info("init client info %p.\n", info); ++ rc = nfs_multipath_client_mount_info_init(info, cl_init); ++ if (rc) { ++ nfs_multipath_client_info_free((void *)info); ++ return rc; ++ } ++ return rc; ++} ++ ++bool nfs_multipath_ip_list_info_match(const struct nfs_ip_list *ip_list_src, ++ const struct nfs_ip_list *ip_list_dst) ++{ ++ int i; ++ int j; ++ bool is_find; ++ /* if both are equal or NULL, then return true. */ ++ if (ip_list_src == ip_list_dst) ++ return true; ++ ++ if ((ip_list_src == NULL || ip_list_dst == NULL)) ++ return false; ++ ++ if (ip_list_src->count != ip_list_dst->count) ++ return false; ++ ++ for (i = 0; i < ip_list_src->count; i++) { ++ is_find = false; ++ for (j = 0; j < ip_list_src->count; j++) { ++ if (rpc_cmp_addr_port( ++ (const struct sockaddr *) ++ &ip_list_src->address[i], ++ (const struct sockaddr *) ++ &ip_list_dst->address[j]) ++ ) { ++ is_find = true; ++ break; ++ } ++ } ++ if (is_find == false) ++ return false; ++ } ++ return true; ++} ++ ++int ++nfs_multipath_dns_list_info_match( ++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoSrc, ++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoDst) ++{ ++ int i; ++ ++ /* if both are equal or NULL, then return true. */ ++ if (pRemoteDnsInfoSrc == pRemoteDnsInfoDst) ++ return true; ++ ++ if ((pRemoteDnsInfoSrc == NULL || pRemoteDnsInfoDst == NULL)) ++ return false; ++ ++ if (pRemoteDnsInfoSrc->dnsNameCount != pRemoteDnsInfoDst->dnsNameCount) ++ return false; ++ ++ for (i = 0; i < pRemoteDnsInfoSrc->dnsNameCount; i++) { ++ if (!strcmp(pRemoteDnsInfoSrc->routeRemoteDnsList[i].dnsname, ++ pRemoteDnsInfoDst->routeRemoteDnsList[i].dnsname)) ++ return false; ++ } ++ return true; ++} ++ ++int nfs_multipath_client_info_match(void *src, void *dst) ++{ ++ int ret = true; ++ ++ struct multipath_client_info *src_info; ++ struct multipath_mount_options *dst_info; ++ ++ src_info = (struct multipath_client_info *)src; ++ dst_info = (struct multipath_mount_options *)dst; ++ pr_info("try match client .\n"); ++ ret = nfs_multipath_ip_list_info_match(src_info->local_ip_list, ++ dst_info->local_ip_list); ++ if (ret == false) { ++ pr_err("local_ip not match.\n"); ++ return ret; ++ } ++ ++ ret = nfs_multipath_ip_list_info_match(src_info->remote_ip_list, ++ dst_info->remote_ip_list); ++ if (ret == false) { ++ pr_err("remote_ip not match.\n"); ++ return ret; ++ } ++ ++ ret = nfs_multipath_dns_list_info_match(src_info->pRemoteDnsInfo, ++ dst_info->pRemoteDnsInfo); ++ if (ret == false) { ++ pr_err("dns not match.\n"); ++ return ret; ++ } ++ pr_info("try match client ret %d.\n", ret); ++ return ret; ++} ++ ++void nfs_multipath_print_ip_info(struct seq_file *mount_option, ++ struct nfs_ip_list *ip_list, ++ const char *type) ++{ ++ char buf[IP_ADDRESS_LEN_MAX + 1]; ++ int len = 0; ++ int i = 0; ++ ++ seq_printf(mount_option, ",%s=", type); ++ for (i = 0; i < ip_list->count; i++) { ++ len = rpc_ntop((struct sockaddr *)&ip_list->address[i], ++ buf, IP_ADDRESS_LEN_MAX); ++ if (len > 0 && len < IP_ADDRESS_LEN_MAX) ++ buf[len] = '\0'; ++ ++ if (i == 0) ++ seq_printf(mount_option, "%s", buf); ++ else ++ seq_printf(mount_option, "~%s", buf); ++ dfprintk(MOUNT, ++ "NFS: show nfs mount option type:%s %s [%s]\n", ++ type, buf, __func__); ++ } ++} ++ ++void nfs_multipath_print_dns_info(struct seq_file *mount_option, ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo, ++ const char *type) ++{ ++ int i = 0; ++ ++ seq_printf(mount_option, ",%s=", type); ++ for (i = 0; i < pRemoteDnsInfo->dnsNameCount; i++) { ++ if (i == 0) ++ seq_printf(mount_option, ++ "[%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ else if (i == pRemoteDnsInfo->dnsNameCount - 1) ++ seq_printf(mount_option, ",%s]", ++ pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ else ++ seq_printf(mount_option, ++ ",%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ } ++} ++ ++ ++static void multipath_print_sockaddr(struct seq_file *seq, ++ struct sockaddr *addr) ++{ ++ switch (addr->sa_family) { ++ case AF_INET: { ++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; ++ ++ seq_printf(seq, "%pI4", &sin->sin_addr); ++ return; ++ } ++ case AF_INET6: { ++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; ++ ++ seq_printf(seq, "%pI6", &sin6->sin6_addr); ++ return; ++ } ++ default: ++ break; ++ } ++ pr_err("unsupport family:%d\n", addr->sa_family); ++} ++ ++static void multipath_print_enfs_info(struct seq_file *seq, ++ struct nfs_server *server) ++{ ++ struct sockaddr_storage peeraddr; ++ struct rpc_clnt *next = server->client; ++ ++ rpc_peeraddr(server->client, ++ (struct sockaddr *)&peeraddr, sizeof(peeraddr)); ++ seq_puts(seq, ",enfs_info="); ++ multipath_print_sockaddr(seq, (struct sockaddr *)&peeraddr); ++ ++ while (next->cl_parent) { ++ if (next == next->cl_parent) ++ break; ++ next = next->cl_parent; ++ } ++ seq_printf(seq, "_%u", next->cl_clid); ++} ++ ++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data) ++{ ++ struct nfs_server *server = data; ++ struct multipath_client_info *client_info = ++ server->nfs_client->cl_multipath_data; ++ ++ dfprintk(MOUNT, "NFS: show nfs mount option[%s]\n", __func__); ++ if ((client_info->remote_ip_list) && ++ (client_info->remote_ip_list->count > 0)) ++ nfs_multipath_print_ip_info(mount_option, ++ client_info->remote_ip_list, ++ "remoteaddrs"); ++ ++ if ((client_info->local_ip_list) && ++ (client_info->local_ip_list->count > 0)) ++ nfs_multipath_print_ip_info(mount_option, ++ client_info->local_ip_list, ++ "localaddrs"); ++ ++ if ((client_info->pRemoteDnsInfo) && ++ (client_info->pRemoteDnsInfo->dnsNameCount > 0)) ++ nfs_multipath_print_dns_info(mount_option, ++ client_info->pRemoteDnsInfo, ++ "remotednsname"); ++ ++ multipath_print_enfs_info(mount_option, server); ++} +diff --git a/fs/nfs/enfs/enfs_multipath_client.h b/fs/nfs/enfs/enfs_multipath_client.h +new file mode 100644 +index 000000000000..208f7260690d +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_client.h +@@ -0,0 +1,26 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _ENFS_MULTIPATH_CLIENT_H_ ++#define _ENFS_MULTIPATH_CLIENT_H_ ++ ++#include "enfs.h" ++ ++struct multipath_client_info { ++ struct work_struct work; ++ struct nfs_ip_list *remote_ip_list; ++ struct nfs_ip_list *local_ip_list; ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; ++ s64 client_id; ++}; ++ ++int nfs_multipath_client_info_init(void **data, ++ const struct nfs_client_initdata *cl_init); ++void nfs_multipath_client_info_free(void *data); ++int nfs_multipath_client_info_match(void *src, void *dst); ++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data); ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_path.c b/fs/nfs/enfs/enfs_path.c +new file mode 100644 +index 000000000000..7355f8c2f672 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_path.c +@@ -0,0 +1,47 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++ ++#include <linux/sunrpc/metrics.h> ++#include <linux/sunrpc/xprt.h> ++ ++#include "enfs.h" ++#include "enfs_log.h" ++#include "enfs_path.h" ++ ++// only create ctx in this function ++// alloc iostat memory in create_clnt ++int enfs_alloc_xprt_ctx(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx; ++ ++ if (!xprt) { ++ enfs_log_error("invalid xprt pointer.\n"); ++ return -EINVAL; ++ } ++ ++ ctx = kzalloc(sizeof(struct enfs_xprt_context), GFP_KERNEL); ++ if (!ctx) { ++ enfs_log_error("add xprt test failed.\n"); ++ return -ENOMEM; ++ } ++ ++ xprt->multipath_context = (void *)ctx; ++ return 0; ++} ++ ++// free multi_context and iostat memory ++void enfs_free_xprt_ctx(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx = xprt->multipath_context; ++ ++ if (ctx) { ++ if (ctx->stats) { ++ rpc_free_iostats(ctx->stats); ++ ctx->stats = NULL; ++ } ++ kfree(xprt->multipath_context); ++ xprt->multipath_context = NULL; ++ } ++} +diff --git a/fs/nfs/enfs/enfs_path.h b/fs/nfs/enfs/enfs_path.h +new file mode 100644 +index 000000000000..97b1ef3730b8 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_path.h +@@ -0,0 +1,12 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++ ++#ifndef ENFS_PATH_H ++#define ENFS_PATH_H ++ ++int enfs_alloc_xprt_ctx(struct rpc_xprt *xprt); ++void enfs_free_xprt_ctx(struct rpc_xprt *xprt); ++ ++#endif // ENFS_PATH_H +diff --git a/fs/nfs/enfs/enfs_proc.c b/fs/nfs/enfs/enfs_proc.c +new file mode 100644 +index 000000000000..53fa1a07642f +--- /dev/null ++++ b/fs/nfs/enfs/enfs_proc.c +@@ -0,0 +1,545 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++#include <linux/module.h> ++#include <linux/proc_fs.h> ++#include <linux/seq_file.h> ++#include <linux/spinlock.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/metrics.h> ++#include <linux/sunrpc/xprtsock.h> ++#include <net/netns/generic.h> ++ ++#include "../../../net/sunrpc/netns.h" ++ ++#include "enfs.h" ++#include "enfs_log.h" ++#include "enfs_proc.h" ++#include "enfs_multipath.h" ++#include "pm_state.h" ++ ++#define ENFS_PROC_DIR "enfs" ++#define ENFS_PROC_PATH_STATUS_LEN 256 ++ ++static struct proc_dir_entry *enfs_proc_parent; ++ ++void ++enfs_iterate_each_rpc_clnt(int (*fn)(struct rpc_clnt *clnt, void *data), ++ void *data) ++{ ++ struct net *net; ++ struct sunrpc_net *sn; ++ struct rpc_clnt *clnt; ++ ++ rcu_read_lock(); ++ for_each_net_rcu(net) { ++ sn = net_generic(net, sunrpc_net_id); ++ if (sn == NULL) ++ continue; ++ spin_lock(&sn->rpc_client_lock); ++ list_for_each_entry(clnt, &sn->all_clients, cl_clients) { ++ fn(clnt, data); ++ } ++ spin_unlock(&sn->rpc_client_lock); ++ } ++ rcu_read_unlock(); ++} ++ ++struct proc_dir_entry *enfs_get_proc_parent(void) ++{ ++ return enfs_proc_parent; ++} ++ ++static int sockaddr_ip_to_str(struct sockaddr *addr, char *buf, int len) ++{ ++ switch (addr->sa_family) { ++ case AF_INET: { ++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; ++ ++ snprintf(buf, len, "%pI4", &sin->sin_addr); ++ return 0; ++ } ++ case AF_INET6: { ++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; ++ ++ snprintf(buf, len, "%pI6", &sin6->sin6_addr); ++ return 0; ++ } ++ default: ++ break; ++ } ++ return 1; ++} ++ ++static bool should_print(const char *name) ++{ ++ int i; ++ static const char * const proc_names[] = { ++ "READ", ++ "WRITE", ++ }; ++ ++ if (name == NULL) ++ return false; ++ ++ for (i = 0; i < ARRAY_SIZE(proc_names); i++) { ++ if (strcmp(name, proc_names[i]) == 0) ++ return true; ++ } ++ return false; ++} ++ ++struct enfs_xprt_iter { ++ unsigned int id; ++ struct seq_file *seq; ++ unsigned int max_addrs_length; ++}; ++ ++static int debug_show_xprt(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ ++ if (xprt->multipath_context) ++ ctx = xprt->multipath_context; ++ ++ pr_info(" xprt:%p ctx:%p main:%d queue_len:%lu.\n", xprt, ++ xprt->multipath_context, ++ ctx ? ctx->main : false, ++ atomic_long_read(&xprt->queuelen)); ++ return 0; ++} ++ ++static int debug_show_clnt(struct rpc_clnt *clnt, void *data) ++{ ++ pr_info(" clnt %d addr:%p enfs:%d\n", ++ clnt->cl_clid, clnt, ++ clnt->cl_enfs); ++ rpc_clnt_iterate_for_each_xprt(clnt, debug_show_xprt, NULL); ++ return 0; ++} ++ ++static void debug_print_all_xprt(void) ++{ ++ enfs_iterate_each_rpc_clnt(debug_show_clnt, NULL); ++} ++ ++static ++void enfs_proc_format_xprt_addr_display(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ char *local_name_buf, ++ int local_name_buf_len, ++ char *remote_name_buf, ++ int remote_name_buf_len) ++{ ++ int err; ++ struct sockaddr_storage srcaddr; ++ struct enfs_xprt_context *ctx; ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ ++ sockaddr_ip_to_str((struct sockaddr *)&xprt->addr, ++ remote_name_buf, remote_name_buf_len); ++ ++ // get local address depend one main or not ++ if (enfs_is_main_xprt(xprt)) { ++ err = rpc_localaddr(clnt, (struct sockaddr *)&srcaddr, ++ sizeof(srcaddr)); ++ if (err != 0) ++ (void)snprintf(local_name_buf, ++ local_name_buf_len, "Unknown"); ++ else ++ sockaddr_ip_to_str((struct sockaddr *)&srcaddr, ++ local_name_buf, ++ local_name_buf_len); ++ } else { ++ sockaddr_ip_to_str((struct sockaddr *)&ctx->srcaddr, ++ local_name_buf, ++ local_name_buf_len); ++ } ++} ++ ++static int enfs_show_xprt_stats(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ unsigned int op; ++ unsigned int maxproc = clnt->cl_maxproc; ++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; ++ struct enfs_xprt_context *ctx; ++ char local_name[INET6_ADDRSTRLEN]; ++ char remote_name[INET6_ADDRSTRLEN]; ++ ++ if (!xprt->multipath_context) ++ return 0; ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ ++ enfs_proc_format_xprt_addr_display(clnt, xprt, local_name, ++ sizeof(local_name), ++ remote_name, sizeof(remote_name)); ++ ++ seq_printf(iter->seq, "%-6u%-*s%-*s", iter->id, ++ iter->max_addrs_length + 4, ++ local_name, ++ iter->max_addrs_length + 4, ++ remote_name); ++ ++ iter->id++; ++ ++ for (op = 0; op < maxproc; op++) { ++ if (!should_print(clnt->cl_procinfo[op].p_name)) ++ continue; ++ ++ seq_printf(iter->seq, "%-22lu%-22Lu%-22Lu", ++ ctx->stats[op].om_ops, ++ ctx->stats[op].om_ops == 0 ? 0 : ++ ktime_to_ms(ctx->stats[op].om_rtt) / ++ ctx->stats[op].om_ops, ++ ctx->stats[op].om_ops == 0 ? 0 : ++ ktime_to_ms(ctx->stats[op].om_execute) / ++ ctx->stats[op].om_ops); ++ } ++ seq_puts(iter->seq, "\n"); ++ return 0; ++} ++ ++static int rpc_proc_show_path_status(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; ++ struct enfs_xprt_context *ctx = NULL; ++ char local_name[INET6_ADDRSTRLEN] = {0}; ++ char remote_name[INET6_ADDRSTRLEN] = {0}; ++ char multiapth_status[ENFS_PROC_PATH_STATUS_LEN] = {0}; ++ char xprt_status[ENFS_PROC_PATH_STATUS_LEN] = {0}; ++ ++ if (!xprt->multipath_context) { ++ enfs_log_debug("multipath_context is null.\n"); ++ return 0; ++ } ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ ++ enfs_proc_format_xprt_addr_display(clnt, xprt, ++ local_name, ++ sizeof(local_name), ++ remote_name, sizeof(remote_name)); ++ ++ pm_get_path_state_desc(xprt, ++ multiapth_status, ++ ENFS_PROC_PATH_STATUS_LEN); ++ ++ pm_get_xprt_state_desc(xprt, ++ xprt_status, ++ ENFS_PROC_PATH_STATUS_LEN); ++ ++ seq_printf(iter->seq, "%-6u%-*s%-*s%-12s%-12s\n", ++ iter->id, iter->max_addrs_length + 4, ++ local_name, iter->max_addrs_length + 4, ++ remote_name, multiapth_status, ++ xprt_status); ++ iter->id++; ++ return 0; ++} ++ ++static int enfs_get_max_addrs_length(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; ++ char local_name[INET6_ADDRSTRLEN]; ++ char remote_name[INET6_ADDRSTRLEN]; ++ ++ enfs_proc_format_xprt_addr_display(clnt, xprt, ++ local_name, sizeof(local_name), ++ remote_name, sizeof(remote_name)); ++ ++ if (iter->max_addrs_length < strlen(local_name)) ++ iter->max_addrs_length = strlen(local_name); ++ ++ if (iter->max_addrs_length < strlen(remote_name)) ++ iter->max_addrs_length = strlen(remote_name); ++ ++ return 0; ++} ++ ++static int rpc_proc_clnt_showpath(struct seq_file *seq, void *v) ++{ ++ struct rpc_clnt *clnt = seq->private; ++ struct enfs_xprt_iter iter; ++ ++ iter.seq = seq; ++ iter.id = 0; ++ iter.max_addrs_length = 0; ++ ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ enfs_get_max_addrs_length, ++ (void *)&iter); ++ ++ seq_printf(seq, "%-6s%-*s%-*s%-12s%-12s\n", "id", ++ iter.max_addrs_length + 4, ++ "local_addr", ++ iter.max_addrs_length + 4, ++ "remote_addr", ++ "path_state", ++ "xprt_state"); ++ ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ rpc_proc_show_path_status, ++ (void *)&iter); ++ return 0; ++} ++ ++static int enfs_rpc_proc_show(struct seq_file *seq, void *v) ++{ ++ struct rpc_clnt *clnt = seq->private; ++ struct enfs_xprt_iter iter; ++ ++ iter.seq = seq; ++ iter.id = 0; ++ iter.max_addrs_length = 0; ++ ++ debug_print_all_xprt(); ++ pr_info("enfs proc clnt:%p\n", clnt); ++ ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ enfs_get_max_addrs_length, ++ (void *)&iter); ++ ++ seq_printf(seq, "%-6s%-*s%-*s%-22s%-22s%-22s%-22s%-22s%-22s\n", "id", ++ iter.max_addrs_length + 4, "local_addr", ++ iter.max_addrs_length + 4, ++ "remote_addr", "r_count", ++ "r_rtt", "r_exec", "w_count", "w_rtt", "w_exec"); ++ ++ // rpc_clnt_show_stats(seq, clnt); ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ enfs_show_xprt_stats, ++ (void *)&iter); ++ return 0; ++} ++ ++static int rpc_proc_open(struct inode *inode, struct file *file) ++{ ++ struct rpc_clnt *clnt = PDE_DATA(inode); ++ ++ pr_info("%s %p\n", __func__, clnt); ++ return single_open(file, enfs_rpc_proc_show, clnt); ++} ++ ++static int enfs_reset_xprt_stats(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ unsigned int op; ++ struct enfs_xprt_context *ctx; ++ unsigned int maxproc = clnt->cl_maxproc; ++ struct rpc_iostats stats = {0}; ++ ++ if (!xprt->multipath_context) ++ return 0; ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ ++ for (op = 0; op < maxproc; op++) { ++ spin_lock(&ctx->stats[op].om_lock); ++ ctx->stats[op] = stats; ++ spin_unlock(&ctx->stats[op].om_lock); ++ } ++ return 0; ++} ++ ++static void trim_newline_ch(char *str, int len) ++{ ++ int i; ++ ++ for (i = 0; str[i] != '\0' && i < len; i++) { ++ if (str[i] == '\n') ++ str[i] = '\0'; ++ } ++} ++ ++static ssize_t enfs_proc_write(struct file *file, ++ const char __user *user_buf, ++ size_t len, ++ loff_t *offset) ++{ ++ char buffer[128]; ++ struct rpc_clnt *clnt = ++ ((struct seq_file *)file->private_data)->private; ++ ++ if (len >= sizeof(buffer)) ++ return -E2BIG; ++ ++ if (copy_from_user(buffer, user_buf, len) != 0) ++ return -EFAULT; ++ ++ buffer[len] = '\0'; ++ trim_newline_ch(buffer, len); ++ if (strcmp(buffer, "reset") != 0) ++ return -EINVAL; ++ ++ rpc_clnt_iterate_for_each_xprt(clnt, enfs_reset_xprt_stats, NULL); ++ return len; ++} ++ ++static int rpc_proc_show_path(struct inode *inode, struct file *file) ++{ ++ struct rpc_clnt *clnt = PDE_DATA(inode); ++ ++ return single_open(file, rpc_proc_clnt_showpath, clnt); ++} ++ ++static const struct file_operations rpc_proc_fops = { ++ .owner = THIS_MODULE, ++ .open = rpc_proc_open, ++ .read = seq_read, ++ .llseek = seq_lseek, ++ .release = single_release, ++ .write = enfs_proc_write, ++}; ++ ++static const struct file_operations rpc_show_path_fops = { ++ .owner = THIS_MODULE, ++ .open = rpc_proc_show_path, ++ .read = seq_read, ++ .llseek = seq_lseek, ++ .release = single_release, ++}; ++ ++static int clnt_proc_name(struct rpc_clnt *clnt, char *buf, int len) ++{ ++ int ret; ++ ++ ret = snprintf(buf, len, "%s_%u", ++ rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR), ++ clnt->cl_clid); ++ if (ret > len) ++ return -E2BIG; ++ return 0; ++} ++ ++static int enfs_proc_create_file(struct rpc_clnt *clnt) ++{ ++ int err; ++ char buf[128]; ++ ++ struct proc_dir_entry *clnt_entry; ++ struct proc_dir_entry *stat_entry; ++ ++ err = clnt_proc_name(clnt, buf, sizeof(buf)); ++ if (err) ++ return err; ++ ++ clnt_entry = proc_mkdir(buf, enfs_proc_parent); ++ if (clnt_entry == NULL) ++ return -EINVAL; ++ ++ stat_entry = proc_create_data("stat", ++ 0, clnt_entry, ++ &rpc_proc_fops, clnt); ++ ++ if (stat_entry == NULL) ++ return -EINVAL; ++ ++ stat_entry = proc_create_data("path", ++ 0, clnt_entry, ++ &rpc_show_path_fops, clnt); ++ ++ if (stat_entry == NULL) ++ return -EINVAL; ++ ++ return 0; ++} ++ ++void enfs_count_iostat(struct rpc_task *task) ++{ ++ struct enfs_xprt_context *ctx = task->tk_xprt->multipath_context; ++ ++ if (!ctx || !ctx->stats) ++ return; ++ rpc_count_iostats(task, ctx->stats); ++} ++ ++static void enfs_proc_delete_file(struct rpc_clnt *clnt) ++{ ++ int err; ++ char buf[128]; ++ ++ err = clnt_proc_name(clnt, buf, sizeof(buf)); ++ if (err) { ++ pr_err("gen clnt name failed.\n"); ++ return; ++ } ++ remove_proc_subtree(buf, enfs_proc_parent); ++} ++ ++// create proc file "/porc/enfs/[mount_ip]_[id]/stat" ++int enfs_proc_create_clnt(struct rpc_clnt *clnt) ++{ ++ int err; ++ ++ err = enfs_proc_create_file(clnt); ++ if (err) { ++ pr_err("create client %d\n", err); ++ return err; ++ } ++ ++ return 0; ++} ++ ++void enfs_proc_delete_clnt(struct rpc_clnt *clnt) ++{ ++ if (clnt->cl_enfs) ++ enfs_proc_delete_file(clnt); ++} ++ ++static int enfs_proc_create_parent(void) ++{ ++ enfs_proc_parent = proc_mkdir(ENFS_PROC_DIR, NULL); ++ ++ if (enfs_proc_parent == NULL) { ++ pr_err("Enfs create proc dir err\n"); ++ return -ENOMEM; ++ } ++ return 0; ++} ++ ++static void enfs_proc_delete_parent(void) ++{ ++ remove_proc_entry(ENFS_PROC_DIR, NULL); ++} ++ ++static int enfs_proc_init_create_clnt(struct rpc_clnt *clnt, void *data) ++{ ++ if (clnt->cl_enfs) ++ enfs_proc_create_file(clnt); ++ return 0; ++} ++ ++static int enfs_proc_destroy_clnt(struct rpc_clnt *clnt, void *data) ++{ ++ if (clnt->cl_enfs) ++ enfs_proc_delete_file(clnt); ++ return 0; ++} ++ ++int enfs_proc_init(void) ++{ ++ int err; ++ ++ err = enfs_proc_create_parent(); ++ if (err) ++ return err; ++ ++ enfs_iterate_each_rpc_clnt(enfs_proc_init_create_clnt, NULL); ++ return 0; ++} ++ ++void enfs_proc_exit(void) ++{ ++ enfs_iterate_each_rpc_clnt(enfs_proc_destroy_clnt, NULL); ++ enfs_proc_delete_parent(); ++} +diff --git a/fs/nfs/enfs/enfs_proc.h b/fs/nfs/enfs/enfs_proc.h +new file mode 100644 +index 000000000000..321951031c2e +--- /dev/null ++++ b/fs/nfs/enfs/enfs_proc.h +@@ -0,0 +1,21 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS PROC. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef ENFS_PROC_H ++#define ENFS_PROC_H ++ ++struct rpc_clnt; ++struct rpc_task; ++struct proc_dir_entry; ++ ++int enfs_proc_init(void); ++void enfs_proc_exit(void); ++struct proc_dir_entry *enfs_get_proc_parent(void); ++int enfs_proc_create_clnt(struct rpc_clnt *clnt); ++void enfs_proc_delete_clnt(struct rpc_clnt *clnt); ++void enfs_count_iostat(struct rpc_task *task); ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_remount.c b/fs/nfs/enfs/enfs_remount.c +new file mode 100644 +index 000000000000..2c3fe125c735 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_remount.c +@@ -0,0 +1,221 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: remount ip source file ++ * Author: y00583252 ++ * Create: 2023-08-12 ++ */ ++#include "enfs_remount.h" ++ ++#include <linux/string.h> ++#include <linux/in.h> ++#include <linux/in6.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/spinlock.h> ++#include <linux/sunrpc/addr.h> ++#include <linux/sunrpc/metrics.h> ++#include <linux/sunrpc/xprtmultipath.h> ++#include <linux/sunrpc/xprtsock.h> ++#include <linux/sunrpc/xprt.h> ++#include <linux/smp.h> ++#include <linux/delay.h> ++ ++#include "enfs.h" ++#include "enfs_log.h" ++#include "enfs_multipath.h" ++#include "enfs_multipath_parse.h" ++#include "enfs_path.h" ++#include "enfs_proc.h" ++#include "enfs_multipath_client.h" ++ ++static bool enfs_rpc_xprt_switch_need_delete_addr( ++ struct multipath_mount_options *enfs_option, ++ struct sockaddr *dstaddr, struct sockaddr *srcaddr) ++{ ++ int i; ++ bool find_same_ip = false; ++ int32_t local_total; ++ int32_t remote_total; ++ ++ local_total = enfs_option->local_ip_list->count; ++ remote_total = enfs_option->remote_ip_list->count; ++ if (local_total == 0 || remote_total == 0) { ++ pr_err("no ip list is present.\n"); ++ return false; ++ } ++ ++ for (i = 0; i < local_total; i++) { ++ find_same_ip = ++ rpc_cmp_addr((struct sockaddr *) ++ &enfs_option->local_ip_list->address[i], ++ srcaddr); ++ if (find_same_ip) ++ break; ++ } ++ ++ if (find_same_ip == false) ++ return true; ++ ++ find_same_ip = false; ++ for (i = 0; i < remote_total; i++) { ++ find_same_ip = ++ rpc_cmp_addr((struct sockaddr *) ++ &enfs_option->remote_ip_list->address[i], ++ dstaddr); ++ if (find_same_ip) ++ break; ++ } ++ ++ if (find_same_ip == false) ++ return true; ++ ++ return false; ++} ++ ++// Used in rcu_lock ++static bool enfs_delete_xprt_from_switch(struct rpc_xprt *xprt, ++ void *enfs_option, ++ struct rpc_xprt_switch *xps) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ struct multipath_mount_options *mopt = ++ (struct multipath_mount_options *)enfs_option; ++ ++ if (enfs_is_main_xprt(xprt)) ++ return true; ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (enfs_rpc_xprt_switch_need_delete_addr(mopt, ++ (struct sockaddr *)&xprt->addr, ++ (struct sockaddr *)&ctx->srcaddr)) { ++ ++ print_enfs_multipath_addr((struct sockaddr *)&ctx->srcaddr, ++ (struct sockaddr *)&xprt->addr); ++ rpc_xprt_switch_remove_xprt(xps, xprt); ++ return true; ++ } ++ ++ return false; ++} ++ ++void enfs_clnt_delete_obsolete_xprts(struct nfs_client *nfs_client, ++ void *enfs_option) ++{ ++ int xprt_count = 0; ++ struct rpc_xprt *pos = NULL; ++ struct rpc_xprt_switch *xps = NULL; ++ ++ rcu_read_lock(); ++ xps = xprt_switch_get( ++ rcu_dereference( ++ nfs_client->cl_rpcclient->cl_xpi.xpi_xpswitch)); ++ if (xps == NULL) { ++ rcu_read_unlock(); ++ xprt_switch_put(xps); ++ return; ++ } ++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { ++ if (xprt_count < MAX_XPRT_NUM_PER_CLIENT) { ++ if (enfs_delete_xprt_from_switch( ++ pos, enfs_option, xps) == false) ++ xprt_count++; ++ } else ++ rpc_xprt_switch_remove_xprt(xps, pos); ++ } ++ rcu_read_unlock(); ++ xprt_switch_put(xps); ++} ++ ++int enfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option) ++{ ++ int errno = 0; ++ char servername[48]; ++ struct multipath_mount_options *remount_lists = ++ (struct multipath_mount_options *)enfs_option; ++ struct multipath_client_info *client_info = ++ (struct multipath_client_info *)nfs_client->cl_multipath_data; ++ struct xprt_create xprtargs; ++ struct rpc_create_args args = { ++ .protocol = nfs_client->cl_proto, ++ .net = nfs_client->cl_net, ++ .addrsize = nfs_client->cl_addrlen, ++ .servername = nfs_client->cl_hostname, ++ }; ++ ++ memset(&xprtargs, 0, sizeof(struct xprt_create)); ++ ++ //mount is not use multipath ++ if (client_info == NULL || enfs_option == NULL) { ++ enfs_log_error( ++ "mount information or remount information is empty.\n"); ++ return -EINVAL; ++ } ++ ++ //remount : localaddrs and remoteaddrs are empty ++ if (remount_lists->local_ip_list->count == 0 && ++ remount_lists->remote_ip_list->count == 0) { ++ enfs_log_info("remount local_ip_list and remote_ip_list are NULL\n"); ++ return 0; ++ } ++ ++ errno = enfs_config_xprt_create_args(&xprtargs, ++ &args, servername, sizeof(servername)); ++ ++ if (errno) { ++ enfs_log_error("config_xprt_create failed! errno:%d\n", errno); ++ return errno; ++ } ++ ++ if (remount_lists->local_ip_list->count == 0) { ++ if (client_info->local_ip_list->count == 0) { ++ errno = rpc_localaddr(nfs_client->cl_rpcclient, ++ (struct sockaddr *) ++ &remount_lists->local_ip_list->address[0], ++ sizeof(struct sockaddr_storage)); ++ if (errno) { ++ enfs_log_error("get clnt srcaddr errno:%d\n", ++ errno); ++ return errno; ++ } ++ remount_lists->local_ip_list->count = 1; ++ } else ++ memcpy(remount_lists->local_ip_list, ++ client_info->local_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (remount_lists->remote_ip_list->count == 0) { ++ if (client_info->remote_ip_list->count == 0) { ++ errno = rpc_peeraddr(nfs_client->cl_rpcclient, ++ (struct sockaddr *) ++ &remount_lists->remote_ip_list->address[0], ++ sizeof(struct sockaddr_storage)); ++ if (errno == 0) { ++ enfs_log_error("get clnt dstaddr errno:%d\n", ++ errno); ++ return errno; ++ } ++ remount_lists->remote_ip_list->count = 1; ++ } else ++ memcpy(remount_lists->remote_ip_list, ++ client_info->remote_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ enfs_log_info("Remount creating new links...\n"); ++ enfs_xprt_ippair_create(&xprtargs, ++ nfs_client->cl_rpcclient, ++ remount_lists); ++ ++ enfs_log_info("Remount deleting obsolete links...\n"); ++ enfs_clnt_delete_obsolete_xprts(nfs_client, remount_lists); ++ ++ memcpy(client_info->local_ip_list, ++ remount_lists->local_ip_list, ++ sizeof(struct nfs_ip_list)); ++ memcpy(client_info->remote_ip_list, ++ remount_lists->remote_ip_list, ++ sizeof(struct nfs_ip_list)); ++ ++ return 0; ++} +diff --git a/fs/nfs/enfs/enfs_remount.h b/fs/nfs/enfs/enfs_remount.h +new file mode 100644 +index 000000000000..a663ed257004 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_remount.h +@@ -0,0 +1,15 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: remount ip header file ++ * Author: y00583252 ++ * Create: 2023-08-12 ++ */ ++#ifndef _ENFS_REMOUNT_ ++#define _ENFS_REMOUNT_ ++#include <linux/string.h> ++#include "enfs.h" ++ ++int enfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option); ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_roundrobin.c b/fs/nfs/enfs/enfs_roundrobin.c +new file mode 100644 +index 000000000000..4e4eda784a3e +--- /dev/null ++++ b/fs/nfs/enfs/enfs_roundrobin.c +@@ -0,0 +1,255 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++#include <linux/spinlock.h> ++#include <linux/module.h> ++#include <linux/printk.h> ++#include <linux/kref.h> ++#include <linux/rculist.h> ++#include <linux/types.h> ++#include <linux/sunrpc/xprt.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/xprtmultipath.h> ++#include "enfs_roundrobin.h" ++ ++#include "enfs.h" ++#include "enfs_config.h" ++#include "pm_state.h" ++ ++typedef struct rpc_xprt *(*enfs_xprt_switch_find_xprt_t)( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur); ++static const struct rpc_xprt_iter_ops enfs_xprt_iter_roundrobin; ++static const struct rpc_xprt_iter_ops enfs_xprt_iter_singular; ++ ++static bool enfs_xprt_is_active(struct rpc_xprt *xprt) ++{ ++ enum pm_path_state state; ++ ++ if (kref_read(&xprt->kref) <= 0) ++ return false; ++ ++ state = pm_get_path_state(xprt); ++ if (state == PM_STATE_NORMAL) ++ return true; ++ ++ return false; ++} ++ ++static struct rpc_xprt *enfs_lb_set_cursor_xprt( ++ struct rpc_xprt_switch *xps, struct rpc_xprt **cursor, ++ enfs_xprt_switch_find_xprt_t find_next) ++{ ++ struct rpc_xprt *pos; ++ struct rpc_xprt *old; ++ ++ old = smp_load_acquire(cursor); /* read latest cursor */ ++ pos = find_next(xps, old); ++ smp_store_release(cursor, pos); /* let cursor point to pos */ ++ return pos; ++} ++ ++static ++struct rpc_xprt *enfs_lb_find_next_entry_roundrobin( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *pos; ++ struct rpc_xprt *prev = NULL; ++ bool found = false; ++ struct rpc_xprt *min_queuelen_xprt = NULL; ++ unsigned long pos_xprt_queuelen; ++ unsigned long min_xprt_queuelen = 0; ++ ++ unsigned long xps_queuelen = atomic_long_read(&xps->xps_queuelen); ++ // delete origin xprt ++ unsigned int multipath_nactive = READ_ONCE(xps->xps_nactive) - 1; ++ ++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { ++ if (enfs_is_main_xprt(pos) || !enfs_xprt_is_active(pos)) { ++ prev = pos; ++ continue; ++ } ++ ++ pos_xprt_queuelen = atomic_long_read(&pos->queuelen); ++ if (min_queuelen_xprt == NULL || ++ pos_xprt_queuelen < min_xprt_queuelen) { ++ ++ min_queuelen_xprt = pos; ++ min_xprt_queuelen = pos_xprt_queuelen; ++ } ++ ++ if (cur == prev) ++ found = true; ++ ++ if (found && pos_xprt_queuelen * ++ multipath_nactive <= xps_queuelen) ++ return pos; ++ prev = pos; ++ }; ++ ++ return min_queuelen_xprt; ++} ++ ++struct rpc_xprt *enfs_lb_switch_find_first_active_xprt( ++ struct rpc_xprt_switch *xps) ++{ ++ struct rpc_xprt *pos; ++ ++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { ++ if (enfs_xprt_is_active(pos)) ++ return pos; ++ }; ++ return NULL; ++} ++ ++struct rpc_xprt *enfs_lb_switch_get_main_xprt(struct rpc_xprt_switch *xps) ++{ ++ return list_first_or_null_rcu(&xps->xps_xprt_list, ++ struct rpc_xprt, xprt_switch); ++} ++ ++static struct rpc_xprt *enfs_lb_switch_get_next_xprt_roundrobin( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *xprt; ++ ++ // disable multipath ++ if (enfs_get_config_multipath_state()) ++ return enfs_lb_switch_get_main_xprt(xps); ++ ++ xprt = enfs_lb_find_next_entry_roundrobin(xps, cur); ++ if (xprt != NULL) ++ return xprt; ++ ++ return enfs_lb_switch_get_main_xprt(xps); ++} ++ ++static ++struct rpc_xprt *enfs_lb_iter_next_entry_roundrobin(struct rpc_xprt_iter *xpi) ++{ ++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); ++ ++ if (xps == NULL) ++ return NULL; ++ ++ return enfs_lb_set_cursor_xprt(xps, &xpi->xpi_cursor, ++ enfs_lb_switch_get_next_xprt_roundrobin); ++} ++ ++static ++struct rpc_xprt *enfs_lb_switch_find_singular_entry( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *pos; ++ bool found = false; ++ ++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { ++ if (cur == pos) ++ found = true; ++ ++ if (found && enfs_xprt_is_active(pos)) ++ return pos; ++ } ++ return NULL; ++} ++ ++struct rpc_xprt *enfs_lb_get_singular_xprt( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *xprt; ++ ++ if (xps == NULL) ++ return NULL; ++ ++ // disable multipath ++ if (enfs_get_config_multipath_state()) ++ return enfs_lb_switch_get_main_xprt(xps); ++ ++ if (cur == NULL || xps->xps_nxprts < 2) ++ return enfs_lb_switch_find_first_active_xprt(xps); ++ ++ xprt = enfs_lb_switch_find_singular_entry(xps, cur); ++ if (!xprt) ++ return enfs_lb_switch_get_main_xprt(xps); ++ ++ return xprt; ++} ++ ++static ++struct rpc_xprt *enfs_lb_iter_next_entry_sigular(struct rpc_xprt_iter *xpi) ++{ ++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); ++ ++ if (xps == NULL) ++ return NULL; ++ ++ return enfs_lb_set_cursor_xprt(xps, &xpi->xpi_cursor, ++ enfs_lb_get_singular_xprt); ++} ++ ++static void enfs_lb_iter_default_rewind(struct rpc_xprt_iter *xpi) ++{ ++ WRITE_ONCE(xpi->xpi_cursor, NULL); ++} ++ ++static void enfs_lb_switch_set_roundrobin(struct rpc_clnt *clnt) ++{ ++ struct rpc_xprt_switch *xps; ++ ++ rcu_read_lock(); ++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); ++ rcu_read_unlock(); ++ if (clnt->cl_vers == 3) { ++ ++ if (READ_ONCE(xps->xps_iter_ops) != &enfs_xprt_iter_roundrobin) ++ WRITE_ONCE(xps->xps_iter_ops, ++ &enfs_xprt_iter_roundrobin); ++ ++ return; ++ } ++ if (READ_ONCE(xps->xps_iter_ops) != &enfs_xprt_iter_singular) ++ WRITE_ONCE(xps->xps_iter_ops, &enfs_xprt_iter_singular); ++} ++ ++static ++struct rpc_xprt *enfs_lb_switch_find_current(struct list_head *head, ++ const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *pos; ++ ++ list_for_each_entry_rcu(pos, head, xprt_switch) { ++ if (cur == pos) ++ return pos; ++ } ++ return NULL; ++} ++ ++static struct rpc_xprt *enfs_lb_iter_current_entry(struct rpc_xprt_iter *xpi) ++{ ++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); ++ struct list_head *head; ++ ++ if (xps == NULL) ++ return NULL; ++ head = &xps->xps_xprt_list; ++ if (xpi->xpi_cursor == NULL || xps->xps_nxprts < 2) ++ return enfs_lb_switch_get_main_xprt(xps); ++ return enfs_lb_switch_find_current(head, xpi->xpi_cursor); ++} ++ ++void enfs_lb_set_policy(struct rpc_clnt *clnt) ++{ ++ enfs_lb_switch_set_roundrobin(clnt); ++} ++ ++static const struct rpc_xprt_iter_ops enfs_xprt_iter_roundrobin = { ++ .xpi_rewind = enfs_lb_iter_default_rewind, ++ .xpi_xprt = enfs_lb_iter_current_entry, ++ .xpi_next = enfs_lb_iter_next_entry_roundrobin, ++}; ++ ++static const struct rpc_xprt_iter_ops enfs_xprt_iter_singular = { ++ .xpi_rewind = enfs_lb_iter_default_rewind, ++ .xpi_xprt = enfs_lb_iter_current_entry, ++ .xpi_next = enfs_lb_iter_next_entry_sigular, ++}; +diff --git a/fs/nfs/enfs/enfs_roundrobin.h b/fs/nfs/enfs/enfs_roundrobin.h +new file mode 100644 +index 000000000000..b72b088a6258 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_roundrobin.h +@@ -0,0 +1,9 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++#ifndef ENFS_ROUNDROBIN_H ++#define ENFS_ROUNDROBIN_H ++ ++void enfs_lb_set_policy(struct rpc_clnt *clnt); ++#endif diff --git a/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch b/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch new file mode 100644 index 0000000..cc6b677 --- /dev/null +++ b/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch @@ -0,0 +1,1607 @@ +diff --git a/fs/nfs/enfs/enfs_config.c b/fs/nfs/enfs/enfs_config.c +new file mode 100644 +index 000000000000..11aa7a00385b +--- /dev/null ++++ b/fs/nfs/enfs/enfs_config.c +@@ -0,0 +1,378 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++#include <linux/cdev.h> ++#include <linux/errno.h> ++#include <linux/fcntl.h> ++#include <linux/fs.h> ++#include <linux/kernel.h> ++#include <linux/kthread.h> ++#include <linux/slab.h> ++#include <linux/string.h> ++#include <linux/uaccess.h> ++#include <linux/delay.h> ++ ++#include "enfs_errcode.h" ++#include "enfs_log.h" ++#include "enfs_config.h" ++ ++#define MAX_FILE_SIZE 8192 ++#define STRING_BUF_SIZE 128 ++#define CONFIG_FILE_PATH "/etc/enfs/config.ini" ++#define ENFS_NOTIFY_FILE_PERIOD 1000UL ++ ++#define MAX_PATH_DETECT_INTERVAL 300 ++#define MIN_PATH_DETECT_INTERVAL 5 ++#define MAX_PATH_DETECT_TIMEOUT 60 ++#define MIN_PATH_DETECT_TIMEOUT 1 ++#define MAX_MULTIPATH_TIMEOUT 60 ++#define MIN_MULTIPATH_TIMEOUT 0 ++#define MAX_MULTIPATH_STATE ENFS_MULTIPATH_DISABLE ++#define MIN_MULTIPATH_STATE ENFS_MULTIPATH_ENABLE ++ ++#define DEFAULT_PATH_DETECT_INTERVAL 10 ++#define DEFAULT_PATH_DETECT_TIMEOUT 5 ++#define DEFAULT_MULTIPATH_TIMEOUT 0 ++#define DEFAULT_MULTIPATH_STATE ENFS_MULTIPATH_ENABLE ++#define DEFAULT_LOADBALANCE_MODE ENFS_LOADBALANCE_RR ++ ++typedef int (*check_and_assign_func)(char *, char *, int, int); ++ ++struct enfs_config_info { ++ int32_t path_detect_interval; ++ int32_t path_detect_timeout; ++ int32_t multipath_timeout; ++ int32_t loadbalance_mode; ++ int32_t multipath_state; ++}; ++ ++struct check_and_assign_value { ++ char *field_name; ++ check_and_assign_func func; ++ int min_value; ++ int max_value; ++}; ++ ++static struct enfs_config_info g_enfs_config_info; ++static struct timespec64 modify_time; ++static struct task_struct *thread; ++ ++static int enfs_check_config_value(char *value, int min_value, int max_value) ++{ ++ unsigned long num_value; ++ int ret; ++ ++ ret = kstrtol(value, 10, &num_value); ++ if (ret != 0) { ++ enfs_log_error("Failed to convert string to int\n"); ++ return -EINVAL; ++ } ++ ++ if (num_value < min_value || num_value > max_value) ++ return -EINVAL; ++ ++ return num_value; ++} ++ ++static int32_t enfs_check_and_assign_int_value(char *field_name, char *value, ++ int min_value, int max_value) ++{ ++ int int_value = enfs_check_config_value(value, min_value, max_value); ++ ++ if (int_value < 0) ++ return -EINVAL; ++ ++ if (strcmp(field_name, "path_detect_interval") == 0) { ++ g_enfs_config_info.path_detect_interval = int_value; ++ return ENFS_RET_OK; ++ } ++ if (strcmp(field_name, "path_detect_timeout") == 0) { ++ g_enfs_config_info.path_detect_timeout = int_value; ++ return ENFS_RET_OK; ++ } ++ if (strcmp(field_name, "multipath_timeout") == 0) { ++ g_enfs_config_info.multipath_timeout = int_value; ++ return ENFS_RET_OK; ++ } ++ if (strcmp(field_name, "multipath_disable") == 0) { ++ g_enfs_config_info.multipath_state = int_value; ++ return ENFS_RET_OK; ++ } ++ return -EINVAL; ++} ++ ++static int32_t enfs_check_and_assign_loadbalance_mode(char *field_name, ++ char *value, ++ int min_value, ++ int max_value) ++{ ++ if (value == NULL) ++ return -EINVAL; ++ ++ if (strcmp(field_name, "multipath_select_policy") == 0) { ++ if (strcmp(value, "roundrobin") == 0) { ++ g_enfs_config_info.loadbalance_mode ++ = ENFS_LOADBALANCE_RR; ++ return ENFS_RET_OK; ++ } ++ } ++ return -EINVAL; ++} ++ ++static const struct check_and_assign_value g_check_and_assign_value[] = { ++ {"path_detect_interval", enfs_check_and_assign_int_value, ++ MIN_PATH_DETECT_INTERVAL, MAX_PATH_DETECT_INTERVAL}, ++ {"path_detect_timeout", enfs_check_and_assign_int_value, ++ MIN_PATH_DETECT_TIMEOUT, MAX_PATH_DETECT_TIMEOUT}, ++ {"multipath_timeout", enfs_check_and_assign_int_value, ++ MIN_MULTIPATH_TIMEOUT, MAX_MULTIPATH_TIMEOUT}, ++ {"multipath_disable", enfs_check_and_assign_int_value, ++ MIN_MULTIPATH_STATE, MAX_MULTIPATH_STATE}, ++ {"multipath_select_policy", enfs_check_and_assign_loadbalance_mode, ++ 0, 0}, ++}; ++ ++static int32_t enfs_read_config_file(char *buffer, char *file_path) ++{ ++ int ret; ++ struct file *filp = NULL; ++ loff_t f_pos = 0; ++ mm_segment_t fs; ++ ++ ++ filp = filp_open(file_path, O_RDONLY, 0); ++ ++ if (IS_ERR(filp)) { ++ enfs_log_error("Failed to open file %s\n", CONFIG_FILE_PATH); ++ ret = -ENOENT; ++ return ret; ++ } ++ ++ fs = get_fs(); ++ set_fs(get_ds()); ++ kernel_read(filp, buffer, MAX_FILE_SIZE, &f_pos); ++ set_fs(fs); ++ ++ ret = filp_close(filp, NULL); ++ if (ret) { ++ enfs_log_error("Close File:%s failed:%d.\n", ++ CONFIG_FILE_PATH, ret); ++ return -EINVAL; ++ } ++ return ENFS_RET_OK; ++} ++ ++static int32_t enfs_deal_with_comment_line(char *buffer) ++{ ++ int ret; ++ char *pos = strchr(buffer, '\n'); ++ ++ if (pos != NULL) ++ ret = strlen(buffer) - strlen(pos); ++ else ++ ret = strlen(buffer); ++ ++ return ret; ++} ++ ++static int32_t enfs_parse_key_value_from_config(char *buffer, char *key, ++ char *value, int keyLen, ++ int valueLen) ++{ ++ char *line; ++ char *tokenPtr; ++ int len; ++ char *tem; ++ char *pos = strchr(buffer, '\n'); ++ ++ if (pos != NULL) ++ len = strlen(buffer) - strlen(pos); ++ else ++ len = strlen(buffer); ++ ++ line = kmalloc(len + 1, GFP_KERNEL); ++ if (!line) { ++ enfs_log_error("Failed to allocate memory.\n"); ++ return -ENOMEM; ++ } ++ line[len] = '\0'; ++ strncpy(line, buffer, len); ++ ++ tem = line; ++ tokenPtr = strsep(&tem, "="); ++ if (tokenPtr == NULL || tem == NULL) { ++ kfree(line); ++ return len; ++ } ++ strncpy(key, strim(tokenPtr), keyLen); ++ strncpy(value, strim(tem), valueLen); ++ ++ kfree(line); ++ return len; ++} ++ ++static int32_t enfs_get_value_from_config_file(char *buffer, char *field_name, ++ char *value, int valueLen) ++{ ++ int ret; ++ char key[STRING_BUF_SIZE + 1] = {0}; ++ char val[STRING_BUF_SIZE + 1] = {0}; ++ ++ while (buffer[0] != '\0') { ++ if (buffer[0] == '\n') { ++ buffer++; ++ } else if (buffer[0] == '#') { ++ ret = enfs_deal_with_comment_line(buffer); ++ if (ret > 0) ++ buffer += ret; ++ } else { ++ ret = enfs_parse_key_value_from_config(buffer, key, val, ++ STRING_BUF_SIZE, ++ STRING_BUF_SIZE); ++ if (ret < 0) { ++ enfs_log_error("failed parse key value, %d\n" ++ , ret); ++ return ret; ++ } ++ key[STRING_BUF_SIZE] = '\0'; ++ val[STRING_BUF_SIZE] = '\0'; ++ ++ buffer += ret; ++ ++ if (strcmp(field_name, key) == 0) { ++ strncpy(value, val, valueLen); ++ return ENFS_RET_OK; ++ } ++ } ++ } ++ enfs_log_error("can not find value which matched field_name: %s.\n", ++ field_name); ++ return -EINVAL; ++} ++ ++int32_t enfs_config_load(void) ++{ ++ char value[STRING_BUF_SIZE + 1]; ++ int ret; ++ int table_len; ++ int min; ++ int max; ++ int i; ++ char *buffer; ++ ++ buffer = kmalloc(MAX_FILE_SIZE, GFP_KERNEL); ++ if (!buffer) { ++ enfs_log_error("Failed to allocate memory.\n"); ++ return -ENOMEM; ++ } ++ memset(buffer, 0, MAX_FILE_SIZE); ++ ++ g_enfs_config_info.path_detect_interval = DEFAULT_PATH_DETECT_INTERVAL; ++ g_enfs_config_info.path_detect_timeout = DEFAULT_PATH_DETECT_TIMEOUT; ++ g_enfs_config_info.multipath_timeout = DEFAULT_MULTIPATH_TIMEOUT; ++ g_enfs_config_info.multipath_state = DEFAULT_MULTIPATH_STATE; ++ g_enfs_config_info.loadbalance_mode = DEFAULT_LOADBALANCE_MODE; ++ ++ table_len = sizeof(g_check_and_assign_value) / ++ sizeof(g_check_and_assign_value[0]); ++ ++ ret = enfs_read_config_file(buffer, CONFIG_FILE_PATH); ++ if (ret != 0) { ++ kfree(buffer); ++ return ret; ++ } ++ ++ for (i = 0; i < table_len; i++) { ++ ret = enfs_get_value_from_config_file(buffer, ++ g_check_and_assign_value[i].field_name, ++ value, STRING_BUF_SIZE); ++ if (ret < 0) ++ continue; ++ ++ value[STRING_BUF_SIZE] = '\0'; ++ min = g_check_and_assign_value[i].min_value; ++ max = g_check_and_assign_value[i].max_value; ++ if (g_check_and_assign_value[i].func != NULL) ++ (*g_check_and_assign_value[i].func)( ++ g_check_and_assign_value[i].field_name, ++ value, min, max); ++ } ++ ++ kfree(buffer); ++ return ENFS_RET_OK; ++} ++ ++int32_t enfs_get_config_path_detect_interval(void) ++{ ++ return g_enfs_config_info.path_detect_interval; ++} ++ ++int32_t enfs_get_config_path_detect_timeout(void) ++{ ++ return g_enfs_config_info.path_detect_timeout; ++} ++ ++int32_t enfs_get_config_multipath_timeout(void) ++{ ++ return g_enfs_config_info.multipath_timeout; ++} ++ ++int32_t enfs_get_config_multipath_state(void) ++{ ++ return g_enfs_config_info.multipath_state; ++} ++ ++int32_t enfs_get_config_loadbalance_mode(void) ++{ ++ return g_enfs_config_info.loadbalance_mode; ++} ++ ++static bool enfs_file_changed(const char *filename) ++{ ++ int err; ++ struct kstat file_stat; ++ ++ err = vfs_stat(filename, &file_stat); ++ if (err) { ++ pr_err("failed to open file:%s err:%d\n", filename, err); ++ return false; ++ } ++ ++ if (timespec64_compare(&modify_time, &file_stat.mtime) == -1) { ++ modify_time = file_stat.mtime; ++ pr_info("file change: %lld %lld\n", modify_time.tv_sec, ++ file_stat.mtime.tv_sec); ++ return true; ++ } ++ ++ return false; ++} ++ ++static int enfs_thread_func(void *data) ++{ ++ while (!kthread_should_stop()) { ++ if (enfs_file_changed(CONFIG_FILE_PATH)) ++ enfs_config_load(); ++ ++ msleep(ENFS_NOTIFY_FILE_PERIOD); ++ } ++ return 0; ++} ++ ++int enfs_config_timer_init(void) ++{ ++ thread = kthread_run(enfs_thread_func, NULL, "enfs_notiy_file_thread"); ++ if (IS_ERR(thread)) { ++ pr_err("Failed to create kernel thread\n"); ++ return PTR_ERR(thread); ++ } ++ return 0; ++} ++ ++void enfs_config_timer_exit(void) ++{ ++ pr_info("enfs_notify_file_exit\n"); ++ if (thread) ++ kthread_stop(thread); ++} +diff --git a/fs/nfs/enfs/enfs_config.h b/fs/nfs/enfs/enfs_config.h +new file mode 100644 +index 000000000000..752710129170 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_config.h +@@ -0,0 +1,32 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs configuration ++ * Author: y00583252 ++ * Create: 2023-07-27 ++ */ ++ ++#ifndef ENFS_CONFIG_H ++#define ENFS_CONFIG_H ++ ++#include <linux/types.h> ++ ++enum enfs_multipath_state { ++ ENFS_MULTIPATH_ENABLE = 0, ++ ENFS_MULTIPATH_DISABLE = 1, ++}; ++ ++enum enfs_loadbalance_mode { ++ ENFS_LOADBALANCE_RR, ++}; ++ ++ ++int32_t enfs_get_config_path_detect_interval(void); ++int32_t enfs_get_config_path_detect_timeout(void); ++int32_t enfs_get_config_multipath_timeout(void); ++int32_t enfs_get_config_multipath_state(void); ++int32_t enfs_get_config_loadbalance_mode(void); ++int32_t enfs_config_load(void); ++int32_t enfs_config_timer_init(void); ++void enfs_config_timer_exit(void); ++#endif // ENFS_CONFIG_H +diff --git a/fs/nfs/enfs/enfs_errcode.h b/fs/nfs/enfs/enfs_errcode.h +new file mode 100644 +index 000000000000..cca47ab9a191 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_errcode.h +@@ -0,0 +1,17 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs errocode ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++ ++#ifndef ENFS_ERRCODE_H ++#define ENFS_ERRCODE_H ++ ++enum { ++ ENFS_RET_OK = 0, ++ ENFS_RET_FAIL ++}; ++ ++#endif // ENFS_ERRCODE_H +diff --git a/fs/nfs/enfs/enfs_log.h b/fs/nfs/enfs/enfs_log.h +new file mode 100644 +index 000000000000..177b404f05df +--- /dev/null ++++ b/fs/nfs/enfs/enfs_log.h +@@ -0,0 +1,25 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: enfs log ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++#ifndef ENFS_LOG_H ++#define ENFS_LOG_H ++ ++#include <linux/printk.h> ++ ++#define enfs_log_info(fmt, ...) \ ++ pr_info("enfs:[%s]" pr_fmt(fmt), \ ++ __func__, ##__VA_ARGS__) ++ ++#define enfs_log_error(fmt, ...) \ ++ pr_err("enfs:[%s]" pr_fmt(fmt), \ ++ __func__, ##__VA_ARGS__) ++ ++#define enfs_log_debug(fmt, ...) \ ++ pr_debug("enfs:[%s]" pr_fmt(fmt), \ ++ __func__, ##__VA_ARGS__) ++ ++#endif // ENFS_ERRCODE_H +diff --git a/fs/nfs/enfs/failover_com.h b/fs/nfs/enfs/failover_com.h +new file mode 100644 +index 000000000000..c52940da232e +--- /dev/null ++++ b/fs/nfs/enfs/failover_com.h +@@ -0,0 +1,23 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: failover time commont header file ++ * Create: 2023-08-02 ++ */ ++#ifndef FAILOVER_COMMON_H ++#define FAILOVER_COMMON_H ++ ++static inline bool failover_is_enfs_clnt(struct rpc_clnt *clnt) ++{ ++ struct rpc_clnt *next = clnt->cl_parent; ++ ++ while (next) { ++ if (next == next->cl_parent) ++ break; ++ next = next->cl_parent; ++ } ++ ++ return next != NULL ? next->cl_enfs : clnt->cl_enfs; ++} ++ ++#endif // FAILOVER_COMMON_H +diff --git a/fs/nfs/enfs/failover_path.c b/fs/nfs/enfs/failover_path.c +new file mode 100644 +index 000000000000..93b454de29d1 +--- /dev/null ++++ b/fs/nfs/enfs/failover_path.c +@@ -0,0 +1,207 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs path failover file ++ * Author: y00583252 ++ * Create: 2023-08-02 ++ */ ++ ++#include "failover_path.h" ++#include <linux/nfs.h> ++#include <linux/nfs3.h> ++#include <linux/nfs4.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/sched.h> ++#include <linux/sunrpc/xprt.h> ++#include "enfs_config.h" ++#include "enfs_log.h" ++#include "failover_com.h" ++#include "pm_state.h" ++#include "pm_ping.h" ++ ++enum failover_policy_t { ++ FAILOVER_NOACTION = 1, ++ FAILOVER_RETRY, ++ FAILOVER_RETRY_DELAY, ++}; ++ ++static void failover_retry_path(struct rpc_task *task) ++{ ++ xprt_release(task); ++ rpc_init_task_retry_counters(task); ++ rpc_task_release_transport(task); ++ rpc_restart_call(task); ++} ++ ++static void failover_retry_path_delay(struct rpc_task *task, int32_t delay) ++{ ++ failover_retry_path(task); ++ rpc_delay(task, delay); ++} ++ ++static void failover_retry_path_by_policy(struct rpc_task *task, ++ enum failover_policy_t policy) ++{ ++ if (policy == FAILOVER_RETRY) ++ failover_retry_path(task); ++ else if (policy == FAILOVER_RETRY_DELAY) ++ failover_retry_path_delay(task, 3 * HZ); // delay 3s ++} ++ ++static ++enum failover_policy_t failover_get_nfs3_retry_policy(struct rpc_task *task) ++{ ++ enum failover_policy_t policy = FAILOVER_NOACTION; ++ const struct rpc_procinfo *procinfo = task->tk_msg.rpc_proc; ++ u32 proc; ++ ++ if (unlikely(procinfo == NULL)) { ++ enfs_log_error("the task contains no valid proc.\n"); ++ return FAILOVER_NOACTION; ++ } ++ ++ proc = procinfo->p_proc; ++ ++ switch (proc) { ++ case NFS3PROC_CREATE: ++ case NFS3PROC_MKDIR: ++ case NFS3PROC_REMOVE: ++ case NFS3PROC_RMDIR: ++ case NFS3PROC_SYMLINK: ++ case NFS3PROC_LINK: ++ case NFS3PROC_SETATTR: ++ case NFS3PROC_WRITE: ++ policy = FAILOVER_RETRY_DELAY; ++ default: ++ policy = FAILOVER_RETRY; ++ } ++ return policy; ++} ++ ++static ++enum failover_policy_t failover_get_nfs4_retry_policy(struct rpc_task *task) ++{ ++ enum failover_policy_t policy = FAILOVER_NOACTION; ++ const struct rpc_procinfo *procinfo = task->tk_msg.rpc_proc; ++ u32 proc_idx; ++ ++ if (unlikely(procinfo == NULL)) { ++ enfs_log_error("the task contains no valid proc.\n"); ++ return FAILOVER_NOACTION; ++ } ++ ++ proc_idx = procinfo->p_statidx; ++ ++ switch (proc_idx) { ++ case NFSPROC4_CLNT_CREATE: ++ case NFSPROC4_CLNT_REMOVE: ++ case NFSPROC4_CLNT_LINK: ++ case NFSPROC4_CLNT_SYMLINK: ++ case NFSPROC4_CLNT_SETATTR: ++ case NFSPROC4_CLNT_WRITE: ++ case NFSPROC4_CLNT_RENAME: ++ case NFSPROC4_CLNT_SETACL: ++ policy = FAILOVER_RETRY_DELAY; ++ default: ++ policy = FAILOVER_RETRY; ++ } ++ return policy; ++} ++ ++static enum failover_policy_t failover_get_retry_policy(struct rpc_task *task) ++{ ++ struct rpc_clnt *clnt = task->tk_client; ++ u32 version = clnt->cl_vers; ++ enum failover_policy_t policy = FAILOVER_NOACTION; ++ ++ // 1. if the task meant to send to certain xprt, take no action ++ if (task->tk_flags & RPC_TASK_FIXED) ++ return FAILOVER_NOACTION; ++ ++ // 2. get policy by different version of nfs protocal ++ if (version == 3) // nfs v3 ++ policy = failover_get_nfs3_retry_policy(task); ++ else if (version == 4) // nfs v4 ++ policy = failover_get_nfs4_retry_policy(task); ++ else ++ return FAILOVER_NOACTION; ++ ++ // 3. if the task is not send to target, retry immediately ++ if (!RPC_WAS_SENT(task)) ++ policy = FAILOVER_RETRY; ++ ++ return policy; ++} ++ ++static int failover_check_task(struct rpc_task *task) ++{ ++ struct rpc_clnt *clnt = NULL; ++ int disable_mpath = enfs_get_config_multipath_state(); ++ ++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { ++ enfs_log_debug("Multipath is not enabled.\n"); ++ return -EINVAL; ++ } ++ ++ if (unlikely((task == NULL) || (task->tk_client == NULL))) { ++ enfs_log_error("The task is not valid.\n"); ++ return -EINVAL; ++ } ++ ++ clnt = task->tk_client; ++ ++ if (clnt->cl_prog != NFS_PROGRAM) { ++ enfs_log_debug("The clnt is not prog{%u} type.\n", ++ clnt->cl_prog); ++ return -EINVAL; ++ } ++ ++ if (!failover_is_enfs_clnt(clnt)) { ++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); ++ return -EINVAL; ++ } ++ return 0; ++} ++ ++void failover_handle(struct rpc_task *task) ++{ ++ enum failover_policy_t policy; ++ int ret; ++ ++ ret = failover_check_task(task); ++ if (ret != 0) ++ return; ++ ++ pm_set_path_state(task->tk_xprt, PM_STATE_FAULT); ++ ++ policy = failover_get_retry_policy(task); ++ ++ failover_retry_path_by_policy(task, policy); ++} ++ ++bool failover_task_need_call_start_again(struct rpc_task *task) ++{ ++ int ret; ++ ++ ret = failover_check_task(task); ++ if (ret != 0) ++ return false; ++ ++ return true; ++} ++ ++bool failover_prepare_transmit(struct rpc_task *task) ++{ ++ if (task->tk_flags & RPC_TASK_FIXED) ++ return true; ++ ++ if (pm_ping_is_test_xprt_task(task)) ++ return true; ++ ++ if (pm_get_path_state(task->tk_xprt) == PM_STATE_FAULT) { ++ task->tk_status = -ETIMEDOUT; ++ return false; ++ } ++ ++ return true; ++} +diff --git a/fs/nfs/enfs/failover_path.h b/fs/nfs/enfs/failover_path.h +new file mode 100644 +index 000000000000..6f1294829a6e +--- /dev/null ++++ b/fs/nfs/enfs/failover_path.h +@@ -0,0 +1,17 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs path failover header file ++ * Author: y00583252 ++ * Create: 2023-08-02 ++ */ ++ ++#ifndef FAILOVER_PATH_H ++#define FAILOVER_PATH_H ++ ++#include <linux/sunrpc/sched.h> ++ ++void failover_handle(struct rpc_task *task); ++bool failover_prepare_transmit(struct rpc_task *task); ++ ++#endif // FAILOVER_PATH_H +diff --git a/fs/nfs/enfs/failover_time.c b/fs/nfs/enfs/failover_time.c +new file mode 100644 +index 000000000000..866ea82d13fc +--- /dev/null ++++ b/fs/nfs/enfs/failover_time.c +@@ -0,0 +1,99 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: failover time file ++ * Create: 2023-08-02 ++ */ ++ ++#include "failover_time.h" ++#include <linux/jiffies.h> ++#include <linux/sunrpc/clnt.h> ++#include "enfs_config.h" ++#include "enfs_log.h" ++#include "failover_com.h" ++#include "pm_ping.h" ++ ++static unsigned long failover_get_mulitipath_timeout(struct rpc_clnt *clnt) ++{ ++ unsigned long config_tmo = enfs_get_config_multipath_timeout() * HZ; ++ unsigned long clnt_tmo = clnt->cl_timeout->to_initval; ++ ++ if (config_tmo == 0) ++ return clnt_tmo; ++ ++ return config_tmo > clnt_tmo ? clnt_tmo : config_tmo; ++} ++ ++void failover_adjust_task_timeout(struct rpc_task *task, void *condition) ++{ ++ struct rpc_clnt *clnt = NULL; ++ unsigned long tmo; ++ int disable_mpath = enfs_get_config_multipath_state(); ++ ++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { ++ enfs_log_debug("Multipath is not enabled.\n"); ++ return; ++ } ++ ++ clnt = task->tk_client; ++ if (unlikely(clnt == NULL)) { ++ enfs_log_error("task associate client is NULL.\n"); ++ return; ++ } ++ ++ if (!failover_is_enfs_clnt(clnt)) { ++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); ++ return; ++ } ++ ++ tmo = failover_get_mulitipath_timeout(clnt); ++ if (tmo == 0) { ++ enfs_log_debug("Multipath is not enabled.\n"); ++ return; ++ } ++ ++ if (task->tk_timeout != 0) ++ task->tk_timeout = ++ task->tk_timeout < tmo ? task->tk_timeout : tmo; ++ else ++ task->tk_timeout = tmo; ++} ++ ++void failover_init_task_req(struct rpc_task *task, struct rpc_rqst *req) ++{ ++ struct rpc_clnt *clnt = NULL; ++ int disable_mpath = enfs_get_config_multipath_state(); ++ ++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { ++ enfs_log_debug("Multipath is not enabled.\n"); ++ return; ++ } ++ ++ clnt = task->tk_client; ++ if (unlikely(clnt == NULL)) { ++ enfs_log_error("task associate client is NULL.\n"); ++ return; ++ } ++ ++ if (!failover_is_enfs_clnt(clnt)) { ++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); ++ return; ++ } ++ ++ if (!pm_ping_is_test_xprt_task(task)) ++ req->rq_timeout = failover_get_mulitipath_timeout(clnt); ++ else { ++ req->rq_timeout = enfs_get_config_path_detect_timeout() * HZ; ++ req->rq_majortimeo = req->rq_timeout + jiffies; ++ } ++ ++ /* ++ * when task is retried, the req is new, we lost major-timeout times, ++ * so we have to restore req major ++ * timeouts from the task, if it is stored. ++ */ ++ if (task->tk_major_timeo != 0) ++ req->rq_majortimeo = task->tk_major_timeo; ++ else ++ task->tk_major_timeo = req->rq_majortimeo; ++} +diff --git a/fs/nfs/enfs/failover_time.h b/fs/nfs/enfs/failover_time.h +new file mode 100644 +index 000000000000..ede25b577a2a +--- /dev/null ++++ b/fs/nfs/enfs/failover_time.h +@@ -0,0 +1,16 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: failover time header file ++ * Create: 2023-08-02 ++ */ ++ ++#ifndef FAILOVER_TIME_H ++#define FAILOVER_TIME_H ++ ++#include <linux/sunrpc/sched.h> ++ ++void failover_adjust_task_timeout(struct rpc_task *task, void *condition); ++void failover_init_task_req(struct rpc_task *task, struct rpc_rqst *req); ++ ++#endif // FAILOVER_TIME_H +diff --git a/fs/nfs/enfs/init.h b/fs/nfs/enfs/init.h +new file mode 100644 +index 000000000000..fdabb9084e19 +--- /dev/null ++++ b/fs/nfs/enfs/init.h +@@ -0,0 +1,17 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs client init ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++ ++#ifndef ENFS_INIT_H ++#define ENFS_INIT_H ++ ++#include <linux/types.h> ++ ++int32_t enfs_init(void); ++void enfs_fini(void); ++ ++#endif +diff --git a/fs/nfs/enfs/mgmt_init.c b/fs/nfs/enfs/mgmt_init.c +new file mode 100644 +index 000000000000..75a40c5e0f6c +--- /dev/null ++++ b/fs/nfs/enfs/mgmt_init.c +@@ -0,0 +1,22 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: mgmt component init ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++ ++#include "mgmt_init.h" ++#include <linux/printk.h> ++#include "enfs_errcode.h" ++#include "enfs_config.h" ++ ++int32_t mgmt_init(void) ++{ ++ return enfs_config_timer_init(); ++} ++ ++void mgmt_fini(void) ++{ ++ enfs_config_timer_exit(); ++} +diff --git a/fs/nfs/enfs/mgmt_init.h b/fs/nfs/enfs/mgmt_init.h +new file mode 100644 +index 000000000000..aa78303b9f01 +--- /dev/null ++++ b/fs/nfs/enfs/mgmt_init.h +@@ -0,0 +1,18 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: mgmt component init ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++ ++#ifndef MGMT_INIT_H ++#define MGMT_INIT_H ++ ++#include <linux/types.h> ++ ++int32_t mgmt_init(void); ++void mgmt_fini(void); ++ ++ ++#endif // MGMT_INIT_H +diff --git a/fs/nfs/enfs/pm_ping.c b/fs/nfs/enfs/pm_ping.c +new file mode 100644 +index 000000000000..24153cd4c7f3 +--- /dev/null ++++ b/fs/nfs/enfs/pm_ping.c +@@ -0,0 +1,421 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: path state header file ++ * Author: x00833432 ++ * Create: 2023-08-21 ++ */ ++ ++#include "pm_ping.h" ++#include <linux/err.h> ++#include <linux/spinlock.h> ++#include <linux/slab.h> ++#include <linux/module.h> ++#include <linux/printk.h> ++#include <linux/kthread.h> ++#include <linux/nfs.h> ++#include <linux/errno.h> ++#include <linux/rcupdate.h> ++#include <linux/workqueue.h> ++#include <net/netns/generic.h> ++#include <linux/atomic.h> ++#include <linux/sunrpc/clnt.h> ++ ++#include "../../../net/sunrpc/netns.h" ++#include "pm_state.h" ++#include "enfs.h" ++#include "enfs_log.h" ++#include "enfs_config.h" ++ ++#define SLEEP_INTERVAL 2 ++extern unsigned int sunrpc_net_id; ++ ++static struct task_struct *pm_ping_timer_thread; ++//protect pint_execute_workq ++static spinlock_t ping_execute_workq_lock; ++// timer for test xprt workqueue ++static struct workqueue_struct *ping_execute_workq; ++// count the ping xprt work on flight ++static atomic_t check_xprt_count; ++ ++struct ping_xprt_work { ++ struct rpc_xprt *xprt; // use this specific xprt ++ struct rpc_clnt *clnt; // use this specific rpc_client ++ struct work_struct ping_work; ++}; ++ ++struct pm_ping_async_callback { ++ void *data; ++ void (*func)(void *data); ++}; ++ ++// set xprt's enum pm_check_state ++void pm_ping_set_path_check_state(struct rpc_xprt *xprt, ++ enum pm_check_state state) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ ++ if (IS_ERR(xprt)) { ++ enfs_log_error("The xprt ptr is not exist.\n"); ++ return; ++ } ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return; ++ } ++ ++ xprt_get(xprt); ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (ctx == NULL) { ++ enfs_log_error("The xprt multipath ctx is not valid.\n"); ++ xprt_put(xprt); ++ return; ++ } ++ ++ atomic_set(&ctx->path_check_state, state); ++ xprt_put(xprt); ++} ++ ++// get xprt's enum pm_check_state ++static enum pm_check_state pm_ping_get_path_check_state(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ enum pm_check_state state; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return PM_CHECK_UNDEFINE; ++ } ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (ctx == NULL) { ++ enfs_log_error("The xprt multipath ctx is not valid.\n"); ++ return PM_CHECK_UNDEFINE; ++ } ++ ++ state = atomic_read(&ctx->path_check_state); ++ ++ return state; ++} ++ ++static void pm_ping_call_done_callback(void *data) ++{ ++ struct pm_ping_async_callback *callback_data = ++ (struct pm_ping_async_callback *)data; ++ ++ if (callback_data == NULL) ++ return; ++ ++ callback_data->func(callback_data->data); ++ ++ kfree(callback_data); ++} ++ ++// Default callback for async RPC calls ++static void pm_ping_call_done(struct rpc_task *task, void *data) ++{ ++ struct rpc_xprt *xprt = task->tk_xprt; ++ ++ atomic_dec(&check_xprt_count); ++ if (task->tk_status >= 0) ++ pm_set_path_state(xprt, PM_STATE_NORMAL); ++ else ++ pm_set_path_state(xprt, PM_STATE_FAULT); ++ ++ pm_ping_set_path_check_state(xprt, PM_CHECK_FINISH); ++ ++ pm_ping_call_done_callback(data); ++} ++ ++// register func to rpc_call_done ++static const struct rpc_call_ops pm_ping_set_status_ops = { ++ .rpc_call_done = pm_ping_call_done, ++}; ++ ++// execute work which in work_queue ++static void pm_ping_execute_work(struct work_struct *work) ++{ ++ int ret = 0; ++ ++ // get the work information ++ struct ping_xprt_work *work_info = ++ container_of(work, struct ping_xprt_work, ping_work); ++ ++ // if check state is pending ++ if (pm_ping_get_path_check_state(work_info->xprt) == PM_CHECK_WAITING) { ++ ++ pm_ping_set_path_check_state(work_info->xprt, ++ PM_CHECK_CHECKING); ++ ++ ret = rpc_clnt_test_xprt(work_info->clnt, ++ work_info->xprt, ++ &pm_ping_set_status_ops, ++ NULL, ++ RPC_TASK_ASYNC | RPC_TASK_FIXED); ++ ++ if (ret < 0) { ++ enfs_log_debug("ping xprt execute failed ,ret %d", ret); ++ ++ pm_ping_set_path_check_state(work_info->xprt, ++ PM_CHECK_FINISH); ++ ++ } else ++ atomic_inc(&check_xprt_count); ++ ++ } ++ ++ atomic_dec(&work_info->clnt->cl_count); ++ xprt_put(work_info->xprt); ++ kfree(work_info); ++ work_info = NULL; ++} ++ ++static bool pm_ping_workqueue_queue_work(struct work_struct *work) ++{ ++ bool ret = false; ++ ++ spin_lock(&ping_execute_workq_lock); ++ ++ if (ping_execute_workq != NULL) ++ ret = queue_work(ping_execute_workq, work); ++ ++ spin_unlock(&ping_execute_workq_lock); ++ return ret; ++} ++ ++// init test work and add this work to workqueue ++static int pm_ping_add_work(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, void *data) ++{ ++ struct ping_xprt_work *work_info; ++ bool ret = false; ++ ++ if (IS_ERR(xprt) || xprt == NULL) { ++ enfs_log_error("The xprt ptr is not exist.\n"); ++ return -EINVAL; ++ } ++ ++ if (IS_ERR(clnt) || clnt == NULL) { ++ enfs_log_error("The clnt ptr is not exist.\n"); ++ return -EINVAL; ++ } ++ ++ if (!xprt->multipath_context) { ++ enfs_log_error("multipath_context is null.\n"); ++ return -EINVAL; ++ } ++ ++ // check xprt pending status, if pending status equals Finish ++ // means this xprt can inster to work queue ++ if (pm_ping_get_path_check_state(xprt) == ++ PM_CHECK_FINISH || ++ pm_ping_get_path_check_state(xprt) == ++ PM_CHECK_INIT) { ++ ++ enfs_log_debug("find xprt pointer. %p\n", xprt); ++ work_info = kzalloc(sizeof(struct ping_xprt_work), GFP_ATOMIC); ++ if (work_info == NULL) ++ return -ENOMEM; ++ work_info->clnt = clnt; ++ atomic_inc(&clnt->cl_count); ++ work_info->xprt = xprt; ++ xprt_get(xprt); ++ INIT_WORK(&work_info->ping_work, pm_ping_execute_work); ++ pm_ping_set_path_check_state(xprt, PM_CHECK_WAITING); ++ ++ ret = pm_ping_workqueue_queue_work(&work_info->ping_work); ++ if (!ret) { ++ atomic_dec(&work_info->clnt->cl_count); ++ xprt_put(work_info->xprt); ++ kfree(work_info); ++ return -EINVAL; ++ } ++ } ++ return 0; ++} ++ ++// encapsulate pm_ping_add_work() ++static int pm_ping_execute_xprt_test(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, void *data) ++{ ++ pm_ping_add_work(clnt, xprt, NULL); ++ // return 0 for rpc_clnt_iterate_for_each_xprt(); ++ // because negative value will stop iterate all xprt ++ // and we need return negative value for debug ++ // Therefore, we need this function to iterate all xprt ++ return 0; ++} ++ ++// export to other module add ping work to workqueue ++int pm_ping_rpc_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) ++{ ++ int ret; ++ ++ ret = pm_ping_add_work(clnt, xprt, NULL); ++ return ret; ++} ++ ++// iterate xprt in the client ++static void pm_ping_loop_rpclnt(struct sunrpc_net *sn) ++{ ++ struct rpc_clnt *clnt; ++ ++ spin_lock(&sn->rpc_client_lock); ++ list_for_each_entry_rcu(clnt, &sn->all_clients, cl_clients) { ++ if (clnt->cl_enfs) { ++ enfs_log_debug("find rpc_clnt. %p\n", clnt); ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ pm_ping_execute_xprt_test, NULL); ++ } ++ } ++ spin_unlock(&sn->rpc_client_lock); ++} ++ ++// iterate each clnt in the sunrpc_net ++static void pm_ping_loop_sunrpc_net(void) ++{ ++ struct net *net; ++ struct sunrpc_net *sn; ++ ++ rcu_read_lock(); ++ for_each_net_rcu(net) { ++ sn = net_generic(net, sunrpc_net_id); ++ if (sn == NULL) ++ continue; ++ pm_ping_loop_rpclnt(sn); ++ } ++ rcu_read_unlock(); ++} ++ ++static int pm_ping_routine(void *data) ++{ ++ while (!kthread_should_stop()) { ++ // equale 0 means open multipath ++ if (enfs_get_config_multipath_state() == ++ ENFS_MULTIPATH_ENABLE) ++ pm_ping_loop_sunrpc_net(); ++ ++ msleep((unsigned int) ++ enfs_get_config_path_detect_interval() * 1000); ++ } ++ return 0; ++} ++ ++// start thread to cycly ping ++static int pm_ping_start(void) ++{ ++ pm_ping_timer_thread = ++ kthread_run(pm_ping_routine, NULL, "pm_ping_routine"); ++ if (IS_ERR(pm_ping_timer_thread)) { ++ enfs_log_error("Failed to create kernel thread\n"); ++ return PTR_ERR(pm_ping_timer_thread); ++ } ++ return 0; ++} ++ ++// initialize workqueue ++static int pm_ping_workqueue_init(void) ++{ ++ struct workqueue_struct *queue = NULL; ++ ++ queue = create_workqueue("pm_ping_workqueue"); ++ ++ if (queue == NULL) { ++ enfs_log_error("create workqueue failed.\n"); ++ return -ENOMEM; ++ } ++ ++ spin_lock(&ping_execute_workq_lock); ++ ping_execute_workq = queue; ++ spin_unlock(&ping_execute_workq_lock); ++ enfs_log_info("create workqueue succeeeded.\n"); ++ return 0; ++} ++ ++static void pm_ping_workqueue_fini(void) ++{ ++ struct workqueue_struct *queue = NULL; ++ ++ spin_lock(&ping_execute_workq_lock); ++ queue = ping_execute_workq; ++ ping_execute_workq = NULL; ++ spin_unlock(&ping_execute_workq_lock); ++ ++ enfs_log_info("delete work queue\n"); ++ ++ if (queue != NULL) { ++ flush_workqueue(queue); ++ destroy_workqueue(queue); ++ } ++} ++ ++// module exit func ++void pm_ping_fini(void) ++{ ++ if (pm_ping_timer_thread) ++ kthread_stop(pm_ping_timer_thread); ++ ++ pm_ping_workqueue_fini(); ++ ++ while (atomic_read(&check_xprt_count) != 0) ++ msleep(SLEEP_INTERVAL); ++} ++ ++// module init func ++int pm_ping_init(void) ++{ ++ int ret; ++ ++ atomic_set(&check_xprt_count, 0); ++ ret = pm_ping_workqueue_init(); ++ if (ret != 0) { ++ enfs_log_error("PM_PING Module loading failed.\n"); ++ return ret; ++ } ++ ret = pm_ping_start(); ++ if (ret != 0) { ++ enfs_log_error("PM_PING Module loading failed.\n"); ++ pm_ping_workqueue_fini(); ++ return ret; ++ } ++ ++ return ret; ++} ++ ++bool pm_ping_is_test_xprt_task(struct rpc_task *task) ++{ ++ return task->tk_ops == &pm_ping_set_status_ops ? true : false; ++} ++ ++int pm_ping_rpc_test_xprt_with_callback(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void (*func)(void *data), ++ void *data) ++{ ++ int ret; ++ ++ struct pm_ping_async_callback *callback_data = ++ kzalloc(sizeof(struct pm_ping_async_callback), GFP_KERNEL); ++ ++ if (callback_data == NULL) { ++ enfs_log_error("failed to mzalloc mem\n"); ++ return -ENOMEM; ++ } ++ ++ callback_data->data = data; ++ callback_data->func = func; ++ atomic_inc(&check_xprt_count); ++ ret = rpc_clnt_test_xprt(clnt, xprt, ++ &pm_ping_set_status_ops, ++ callback_data, ++ RPC_TASK_ASYNC | RPC_TASK_FIXED); ++ ++ if (ret < 0) { ++ enfs_log_debug("ping xprt execute failed ,ret %d", ret); ++ atomic_dec(&check_xprt_count); ++ } ++ ++ return ret; ++} +diff --git a/fs/nfs/enfs/pm_ping.h b/fs/nfs/enfs/pm_ping.h +new file mode 100644 +index 000000000000..6bcb94bfc836 +--- /dev/null ++++ b/fs/nfs/enfs/pm_ping.h +@@ -0,0 +1,33 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs configuration ++ * Author: x00833432 ++ * Create: 2023-07-27 ++ */ ++ ++#ifndef PM_PING_H ++#define PM_PING_H ++ ++#include <linux/sunrpc/clnt.h> ++ ++enum pm_check_state { ++ PM_CHECK_INIT, // this xprt never been queued ++ PM_CHECK_WAITING, // this xprt waiting in the queue ++ PM_CHECK_CHECKING, // this xprt is testing ++ PM_CHECK_FINISH, // this xprt has been finished ++ PM_CHECK_UNDEFINE, // undefine multipath struct ++}; ++ ++int pm_ping_init(void); ++void pm_ping_fini(void); ++int pm_ping_rpc_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt); ++void pm_ping_set_path_check_state(struct rpc_xprt *xprt, ++ enum pm_check_state state); ++bool pm_ping_is_test_xprt_task(struct rpc_task *task); ++int pm_ping_rpc_test_xprt_with_callback(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void (*func)(void *data), ++ void *data); ++ ++#endif // PM_PING_H +diff --git a/fs/nfs/enfs/pm_state.c b/fs/nfs/enfs/pm_state.c +new file mode 100644 +index 000000000000..220621a207a2 +--- /dev/null ++++ b/fs/nfs/enfs/pm_state.c +@@ -0,0 +1,158 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: path state file ++ * Author: y00583252 ++ * Create: 2023-08-12 ++ */ ++#include "pm_state.h" ++#include <linux/sunrpc/xprt.h> ++ ++#include "enfs.h" ++#include "enfs_log.h" ++ ++enum pm_path_state pm_get_path_state(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ enum pm_path_state state; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return PM_STATE_UNDEFINED; ++ } ++ ++ xprt_get(xprt); ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (ctx == NULL) { ++ enfs_log_error("The xprt multipath ctx is not valid.\n"); ++ xprt_put(xprt); ++ return PM_STATE_UNDEFINED; ++ } ++ ++ state = atomic_read(&ctx->path_state); ++ ++ xprt_put(xprt); ++ ++ return state; ++} ++ ++void pm_set_path_state(struct rpc_xprt *xprt, enum pm_path_state state) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ enum pm_path_state cur_state; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return; ++ } ++ ++ xprt_get(xprt); ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (ctx == NULL) { ++ enfs_log_error("The xprt multipath ctx is not valid.\n"); ++ xprt_put(xprt); ++ return; ++ } ++ ++ cur_state = atomic_read(&ctx->path_state); ++ if (cur_state == state) { ++ enfs_log_debug("The xprt is already {%d}.\n", state); ++ xprt_put(xprt); ++ return; ++ } ++ ++ atomic_set(&ctx->path_state, state); ++ enfs_log_info("The xprt {%p} path state change from {%d} to {%d}.\n", ++ xprt, cur_state, state); ++ ++ xprt_put(xprt); ++} ++ ++void pm_get_path_state_desc(struct rpc_xprt *xprt, char *buf, int len) ++{ ++ enum pm_path_state state; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return; ++ } ++ ++ if ((buf == NULL) || (len <= 0)) { ++ enfs_log_error("Buffer is not valid, len=%d.\n", len); ++ return; ++ } ++ ++ state = pm_get_path_state(xprt); ++ ++ switch (state) { ++ case PM_STATE_INIT: ++ (void)snprintf(buf, len, "Init"); ++ break; ++ case PM_STATE_NORMAL: ++ (void)snprintf(buf, len, "Normal"); ++ break; ++ case PM_STATE_FAULT: ++ (void)snprintf(buf, len, "Fault"); ++ break; ++ default: ++ (void)snprintf(buf, len, "Unknown"); ++ break; ++ } ++} ++ ++void pm_get_xprt_state_desc(struct rpc_xprt *xprt, char *buf, int len) ++{ ++ int i; ++ unsigned long state; ++ static unsigned long xprt_mask[] = { ++ XPRT_LOCKED, XPRT_CONNECTED, ++ XPRT_CONNECTING, XPRT_CLOSE_WAIT, ++ XPRT_BOUND, XPRT_BINDING, XPRT_CLOSING, ++ XPRT_CONGESTED}; ++ ++ static const char *const xprt_state_desc[] = { ++ "LOCKED", "CONNECTED", "CONNECTING", ++ "CLOSE_WAIT", "BOUND", "BINDING", ++ "CLOSING", "CONGESTED"}; ++ int pos = 0; ++ int ret = 0; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return; ++ } ++ ++ if ((buf == NULL) || (len <= 0)) { ++ enfs_log_error( ++ "Xprt state buffer is not valid, len=%d.\n", ++ len); ++ return; ++ } ++ ++ xprt_get(xprt); ++ state = READ_ONCE(xprt->state); ++ xprt_put(xprt); ++ ++ for (i = 0; i < ARRAY_SIZE(xprt_mask); ++i) { ++ if (pos >= len) ++ break; ++ ++ if (!test_bit(xprt_mask[i], &state)) ++ continue; ++ ++ if (pos == 0) ++ ret = snprintf(buf, len, "%s", xprt_state_desc[i]); ++ else ++ ret = snprintf(buf + pos, len - pos, "|%s", ++ xprt_state_desc[i]); ++ ++ if (ret < 0) { ++ enfs_log_error("format state failed, ret %d.\n", ret); ++ break; ++ } ++ ++ pos += ret; ++ } ++} +diff --git a/fs/nfs/enfs/pm_state.h b/fs/nfs/enfs/pm_state.h +new file mode 100644 +index 000000000000..f5f52e5ab91d +--- /dev/null ++++ b/fs/nfs/enfs/pm_state.h +@@ -0,0 +1,28 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: path state header file ++ * Author: y00583252 ++ * Create: 2023-08-12 ++ */ ++ ++#ifndef PM_STATE_H ++#define PM_STATE_H ++ ++#include <linux/types.h> ++#include <linux/sunrpc/xprt.h> ++ ++enum pm_path_state { ++ PM_STATE_INIT, ++ PM_STATE_NORMAL, ++ PM_STATE_FAULT, ++ PM_STATE_UNDEFINED // xprt is not multipath xprt ++}; ++ ++void pm_set_path_state(struct rpc_xprt *xprt, enum pm_path_state state); ++enum pm_path_state pm_get_path_state(struct rpc_xprt *xprt); ++ ++void pm_get_path_state_desc(struct rpc_xprt *xprt, char *buf, int len); ++void pm_get_xprt_state_desc(struct rpc_xprt *xprt, char *buf, int len); ++ ++#endif // PM_STATE_H diff --git a/0006-add_enfs_compile_option.patch b/0006-add_enfs_compile_option.patch new file mode 100644 index 0000000..ff3bc0e --- /dev/null +++ b/0006-add_enfs_compile_option.patch @@ -0,0 +1,70 @@ +diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig +index b04256636d4b..ae53510c0627 100644 +--- a/arch/arm64/configs/openeuler_defconfig ++++ b/arch/arm64/configs/openeuler_defconfig +@@ -5344,6 +5344,7 @@ CONFIG_LOCKD=m + CONFIG_LOCKD_V4=y + CONFIG_NFS_ACL_SUPPORT=m + CONFIG_NFS_COMMON=y ++# CONFIG_ENFS is not set + CONFIG_SUNRPC=m + CONFIG_SUNRPC_GSS=m + CONFIG_SUNRPC_BACKCHANNEL=y +diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig +index 59baeb2973af..ccc317f7fdb2 100644 +--- a/arch/x86/configs/openeuler_defconfig ++++ b/arch/x86/configs/openeuler_defconfig +@@ -6825,6 +6825,7 @@ CONFIG_LOCKD=m + CONFIG_LOCKD_V4=y + CONFIG_NFS_ACL_SUPPORT=m + CONFIG_NFS_COMMON=y ++CONFIG_ENFS=y + CONFIG_SUNRPC=m + CONFIG_SUNRPC_GSS=m + CONFIG_SUNRPC_BACKCHANNEL=y +diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig +index e55f86713948..872c9b7671b1 100644 +--- a/fs/nfs/Kconfig ++++ b/fs/nfs/Kconfig +@@ -196,3 +196,14 @@ config NFS_DEBUG + depends on NFS_FS && SUNRPC_DEBUG + select CRC32 + default y ++ ++config ENFS ++ tristate "NFS client support for ENFS" ++ depends on NFS_FS ++ default n ++ help ++ This option enables support multipath of the NFS protocol ++ in the kernel's NFS client. ++ This feature will improve performance and reliability. ++ ++ If sure, say Y. +diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile +index c587e3c4c6a6..19d0ac2ba3b8 100644 +--- a/fs/nfs/Makefile ++++ b/fs/nfs/Makefile +@@ -12,6 +12,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \ + nfs-$(CONFIG_ROOT_NFS) += nfsroot.o + nfs-$(CONFIG_SYSCTL) += sysctl.o + nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o ++nfs-$(CONFIG_ENFS) += enfs_adapter.o + + obj-$(CONFIG_NFS_V2) += nfsv2.o + nfsv2-y := nfs2super.o proc.o nfs2xdr.o +@@ -34,3 +35,5 @@ nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o + obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/ + obj-$(CONFIG_PNFS_BLOCK) += blocklayout/ + obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += flexfilelayout/ ++ ++obj-$(CONFIG_ENFS) += enfs/ +diff --git a/net/sunrpc/Makefile b/net/sunrpc/Makefile +index 090658c3da12..fe4e3b28c5d1 100644 +--- a/net/sunrpc/Makefile ++++ b/net/sunrpc/Makefile +@@ -19,3 +19,4 @@ sunrpc-$(CONFIG_SUNRPC_DEBUG) += debugfs.o + sunrpc-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel_rqst.o + sunrpc-$(CONFIG_PROC_FS) += stats.o + sunrpc-$(CONFIG_SYSCTL) += sysctl.o ++sunrpc-$(CONFIG_ENFS) += sunrpc_enfs_adapter.o -- 2.25.0.windows.1

1 0

[PATCH openEuler-1.0-LTS] [just for review!!!!]Add feature: eNFS - nfs multipath to improve performance and reliability
by mingqian218472 25 Sep '23

25 Sep '23

From: 闫海涛 <yanhaitao2(a)huawei.com> driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7SVH7 --------------------------------- Currently, the NFS client can use only one server IP address at a single mount point. As a result, the hardware capability of multiple storage nodes and NICs cannot be fully utilized. In multiple financial sites, the performance cannot meet service requirements. In addition, when a single link is faulty, services are suspended. The reliability problem needs to be solved. OpenEuler-based commercial OS vendors hope that the eNFS feature will be integrated into 20.03 SP4 to resolve performance and reliability problems. When user mount one NFS share, can input localaddrs/remoteaddrs these two optional Parameters to use eNFS multipath. If these optional parameters are not used, NFS will behave as before. For example, mount -t nfs -o [localaddrs=127.17.0.1-127.17.0.4],[remoteaddrs=127.17.1.1-127.17.1.4] xx.xx.xx.xx:/test /mnt/test Changes in eNFS are as follows: 1. patch 0001: At the NFS layer, the eNFS registration function is called back when the mount command parses parameters. The eNFS parses and saves the IP address list entered by users. 2. patch 0002: At the sunrpc layer, the eNFS registration function is called back When the NFS uses sunrpc to create rpc_clnt, the eNFS combines the IP address list entered for mount to generate multiple xprts. When the I/O times out, the callback function of the eNFS is called back so that the eNFS switches to an available link for retry. 3. patch 0003: The eNFS module registers the interface for parsing the mount command. During the mount process, the NFS invokes the eNFS interface to enable the eNFS to parse the mounting parameters of UltraPath. The eNFS module saves the mounting parameters to the context of nfs_client. 4. patch 0004: When the NFS invokes the SunRPC to create rpc_clnt, the eNFS interface is called back. The eNFS creates multiple xprts based on the output IP address list. When NFS V3 I/Os are delivered, eNFS distributes I/Os to available links based on the link status, improving performance through load balancing. 5. patch 0005: When sending I/Os from the SunRPC module to the NFS server times out, the SunRPC module calls back the eNFS module to reselect a link. The eNFS module distributes I/Os to other available links, preventing service interruption caused by a single link failure. 6. patch 0006: The eNFS compilation option and makefile are added. By default, the eNFS compilation is not performed. Signed-off-by: mingqian218472 <zhangmingqian.zhang(a)huawei.com> --- ...-nfs-multipath-to-improve-performanc.patch | 6148 +++++++++++++++++ ...enfs_registe_and_handle_mount_option.patch | 757 ++ ...nd_create_multipath_then_dispatch_IO.patch | 805 +++ ...add_enfs_module_for_nfs_mount_option.patch | 1209 ++++ ...dd_enfs_module_for_sunrpc_multipatch.patch | 1581 +++++ ...le_for_sunrpc_failover_and_configure.patch | 1607 +++++ 0006-add_enfs_compile_option.patch | 70 + 7 files changed, 12177 insertions(+) create mode 100644 0001-Add-feature-eNFS-nfs-multipath-to-improve-performanc.patch create mode 100644 0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch create mode 100644 0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch create mode 100644 0003-add_enfs_module_for_nfs_mount_option.patch create mode 100644 0004-add_enfs_module_for_sunrpc_multipatch.patch create mode 100644 0005-add_enfs_module_for_sunrpc_failover_and_configure.patch create mode 100644 0006-add_enfs_compile_option.patch diff --git a/0001-Add-feature-eNFS-nfs-multipath-to-improve-performanc.patch b/0001-Add-feature-eNFS-nfs-multipath-to-improve-performanc.patch new file mode 100644 index 0000000..2974c5f --- /dev/null +++ b/0001-Add-feature-eNFS-nfs-multipath-to-improve-performanc.patch @@ -0,0 +1,6148 @@ +From 53f616b0a649494e33d30b250d06c4049ccb88be Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?=E9=97=AB=E6=B5=B7=E6=B6=9B?= <yanhaitao2(a)huawei.com> +Date: Mon, 25 Sep 2023 19:19:15 +0800 +Subject: [PATCH openEuler-20.03-LTS-SP3] Add feature: eNFS - nfs multipath to + improve performance and reliability + +driver inclusion +category: feature +bugzilla: https://gitee.com/openeuler/release-management/issues/I7U0W0 + +--------------------------------- + +Currently, the NFS client can use only one server IP address at a single mount point. As a result, the hardware capability of multiple storage nodes and NICs cannot be fully utilized. In multiple financial sites, the performance cannot meet service requirements. In addition, when a single link is faulty, services are suspended. The reliability problem needs to be solved. +OpenEuler-based commercial OS vendors hope that the eNFS feature will be integrated into 20.03 SP4 to resolve performance and reliability problems. + +When user mount one NFS share, can input localaddrs/remoteaddrs these two optional Parameters to use eNFS multipath. If these optional parameters are not used, NFS will behave as before. For example, +mount -t nfs -o [localaddrs=127.17.0.1-127.17.0.4],[remoteaddrs=127.17.1.1-127.17.1.4] xx.xx.xx.xx:/test /mnt/test + +Changes in eNFS are as follows: +1. patch 0001: +At the NFS layer, the eNFS registration function is called back when the mount command parses parameters. The eNFS parses and saves the IP address list entered by users. +2. patch 0002: +At the sunrpc layer, the eNFS registration function is called back When the NFS uses sunrpc to create rpc_clnt, the eNFS combines the IP address list entered for mount to generate multiple xprts. When the I/O times out, the callback function of the eNFS is called back so that the eNFS switches to an available link for retry. +3. patch 0003: +The eNFS module registers the interface for parsing the mount command. During the mount process, the NFS invokes the eNFS interface to enable the eNFS to parse the mounting parameters of UltraPath. The eNFS module saves the mounting parameters to the context of nfs_client. +4. patch 0004: +When the NFS invokes the SunRPC to create rpc_clnt, the eNFS interface is called back. The eNFS creates multiple xprts based on the output IP address list. When NFS V3 I/Os are delivered, eNFS distributes I/Os to available links based on the link status, improving performance through load balancing. +5. patch 0005: +When sending I/Os from the SunRPC module to the NFS server times out, the SunRPC module calls back the eNFS module to reselect a link. The eNFS module distributes I/Os to other available links, preventing service interruption caused by a single link failure. +6. patch 0006: +The eNFS compilation option and makefile are added. By default, the eNFS compilation is not performed. + +Signed-off-by: mingqian218472 <zhangmingqian.zhang(a)huawei.com> +--- + ...enfs_registe_and_handle_mount_option.patch | 757 ++++++++ + ...nd_create_multipath_then_dispatch_IO.patch | 805 +++++++++ + ...add_enfs_module_for_nfs_mount_option.patch | 1209 +++++++++++++ + ...dd_enfs_module_for_sunrpc_multipatch.patch | 1581 ++++++++++++++++ + ...le_for_sunrpc_failover_and_configure.patch | 1607 +++++++++++++++++ + 0006-add_enfs_compile_option.patch | 70 + + kernel.spec | 13 + + 7 files changed, 6042 insertions(+) + create mode 100644 0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch + create mode 100644 0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch + create mode 100644 0003-add_enfs_module_for_nfs_mount_option.patch + create mode 100644 0004-add_enfs_module_for_sunrpc_multipatch.patch + create mode 100644 0005-add_enfs_module_for_sunrpc_failover_and_configure.patch + create mode 100644 0006-add_enfs_compile_option.patch + +diff --git a/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch b/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch +new file mode 100644 +index 0000000..38e57a9 +--- /dev/null ++++ b/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch +@@ -0,0 +1,757 @@ ++diff --git a/fs/nfs/client.c b/fs/nfs/client.c ++index 7d02dc52209d..50820a8a684a 100644 ++--- a/fs/nfs/client.c +++++ b/fs/nfs/client.c ++@@ -48,7 +48,7 @@ ++ #include "callback.h" ++ #include "delegation.h" ++ #include "iostat.h" ++-#include "internal.h" +++#include "enfs_adapter.h" ++ #include "fscache.h" ++ #include "pnfs.h" ++ #include "nfs.h" ++@@ -255,6 +255,7 @@ void nfs_free_client(struct nfs_client *clp) ++ put_nfs_version(clp->cl_nfs_mod); ++ kfree(clp->cl_hostname); ++ kfree(clp->cl_acceptor); +++ nfs_free_multi_path_client(clp); ++ kfree(clp); ++ } ++ EXPORT_SYMBOL_GPL(nfs_free_client); ++@@ -330,6 +331,9 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat ++ sap)) ++ continue; ++ +++ if (!nfs_multipath_client_match(clp, data)) +++ continue; +++ ++ refcount_inc(&clp->cl_count); ++ return clp; ++ } ++@@ -512,6 +516,9 @@ int nfs_create_rpc_client(struct nfs_client *clp, ++ .program = &nfs_program, ++ .version = clp->rpc_ops->version, ++ .authflavor = flavor, +++#if IS_ENABLED(CONFIG_ENFS) +++ .multipath_option = cl_init->enfs_option, +++#endif ++ }; ++ ++ if (test_bit(NFS_CS_DISCRTRY, &clp->cl_flags)) ++@@ -634,6 +641,13 @@ struct nfs_client *nfs_init_client(struct nfs_client *clp, ++ /* the client is already initialised */ ++ if (clp->cl_cons_state == NFS_CS_READY) ++ return clp; +++ error = nfs_create_multi_path_client(clp, cl_init); +++ if (error < 0) { +++ dprintk("%s: create failed.%d!\n", __func__, error); +++ nfs_put_client(clp); +++ clp = ERR_PTR(error); +++ return clp; +++ } ++ ++ /* ++ * Create a client RPC handle for doing FSSTAT with UNIX auth only ++@@ -666,6 +680,9 @@ static int nfs_init_server(struct nfs_server *server, ++ .net = data->net, ++ .timeparms = &timeparms, ++ .init_flags = (1UL << NFS_CS_REUSEPORT), +++#if IS_ENABLED(CONFIG_ENFS) +++ .enfs_option = data->enfs_option, +++#endif ++ }; ++ struct nfs_client *clp; ++ int error; ++diff --git a/fs/nfs/enfs_adapter.c b/fs/nfs/enfs_adapter.c ++new file mode 100644 ++index 000000000000..7f471f2072c4 ++--- /dev/null +++++ b/fs/nfs/enfs_adapter.c ++@@ -0,0 +1,230 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/types.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs3.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include <linux/sunrpc/sched.h> +++#include <linux/nfs_iostat.h> +++#include "enfs_adapter.h" +++#include "iostat.h" +++ +++struct enfs_adapter_ops __rcu *enfs_adapter; +++ +++int enfs_adapter_register(struct enfs_adapter_ops *ops) +++{ +++ struct enfs_adapter_ops *old; +++ +++ old = cmpxchg((struct enfs_adapter_ops **)&enfs_adapter, NULL, ops); +++ if (old == NULL || old == ops) +++ return 0; +++ pr_err("regist %s ops %p failed. old %p\n", __func__, ops, old); +++ return -EPERM; +++} +++EXPORT_SYMBOL_GPL(enfs_adapter_register); +++ +++int enfs_adapter_unregister(struct enfs_adapter_ops *ops) +++{ +++ struct enfs_adapter_ops *old; +++ +++ old = cmpxchg((struct enfs_adapter_ops **)&enfs_adapter, ops, NULL); +++ if (old == ops || old == NULL) +++ return 0; +++ pr_err("unregist %s ops %p failed. old %p\n", __func__, ops, old); +++ return -EPERM; +++} +++EXPORT_SYMBOL_GPL(enfs_adapter_unregister); +++ +++struct enfs_adapter_ops *nfs_multipath_router_get(void) +++{ +++ struct enfs_adapter_ops *ops; +++ +++ rcu_read_lock(); +++ ops = rcu_dereference(enfs_adapter); +++ if (ops == NULL) { +++ rcu_read_unlock(); +++ return NULL; +++ } +++ if (!try_module_get(ops->owner)) +++ ops = NULL; +++ rcu_read_unlock(); +++ return ops; +++} +++ +++void nfs_multipath_router_put(struct enfs_adapter_ops *ops) +++{ +++ if (ops) +++ module_put(ops->owner); +++} +++ +++bool is_valid_option(enum nfsmultipathoptions option) +++{ +++ if (option < REMOTEADDR || option >= INVALID_OPTION) { +++ pr_warn("%s: ENFS invalid option %d\n", __func__, option); +++ return false; +++ } +++ +++ return true; +++} +++ +++int enfs_parse_mount_options(enum nfsmultipathoptions option, char *str, +++ struct nfs_parsed_mount_data *mnt) +++{ +++ +++ //parseMultiPathOptions(getNfsMultiPathOpt(token), string, mnt); +++ +++ int rc; +++ struct enfs_adapter_ops *ops; +++ +++ ops = nfs_multipath_router_get(); +++ if ((ops == NULL) || (ops->parse_mount_options == NULL) || +++ !is_valid_option(option)) { +++ nfs_multipath_router_put(ops); +++ dfprintk(MOUNT, +++ "NFS: parsing nfs mount option enfs not load[%s]\n" +++ , __func__); +++ return -EOPNOTSUPP; +++ } +++ // nfs_multipath_parse_options +++ dfprintk(MOUNT, "NFS: parsing nfs mount option '%s' type: %d[%s]\n" +++ , str, option, __func__); +++ rc = ops->parse_mount_options(option, str, &mnt->enfs_option, mnt->net); +++ nfs_multipath_router_put(ops); +++ return rc; +++} +++ +++void enfs_free_mount_options(struct nfs_parsed_mount_data *data) +++{ +++ struct enfs_adapter_ops *ops; +++ +++ if (data->enfs_option == NULL) +++ return; +++ +++ ops = nfs_multipath_router_get(); +++ if ((ops == NULL) || (ops->free_mount_options == NULL)) { +++ nfs_multipath_router_put(ops); +++ return; +++ } +++ ops->free_mount_options((void *)&data->enfs_option); +++ nfs_multipath_router_put(ops); +++} +++ +++int nfs_create_multi_path_client(struct nfs_client *client, +++ const struct nfs_client_initdata *cl_init) +++{ +++ int ret = 0; +++ struct enfs_adapter_ops *ops; +++ +++ if (cl_init->enfs_option == NULL) +++ return 0; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->client_info_init != NULL) +++ ret = ops->client_info_init( +++ (void *)&client->cl_multipath_data, cl_init); +++ nfs_multipath_router_put(ops); +++ +++ return ret; +++} +++EXPORT_SYMBOL_GPL(nfs_create_multi_path_client); +++ +++void nfs_free_multi_path_client(struct nfs_client *clp) +++{ +++ struct enfs_adapter_ops *ops; +++ +++ if (clp->cl_multipath_data == NULL) +++ return; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->client_info_free != NULL) +++ ops->client_info_free(clp->cl_multipath_data); +++ nfs_multipath_router_put(ops); +++} +++ +++int nfs_multipath_client_match(struct nfs_client *clp, +++ const struct nfs_client_initdata *sap) +++{ +++ int ret = true; +++ struct enfs_adapter_ops *ops; +++ +++ pr_info("%s src %p dst %p\n.", __func__, +++ clp->cl_multipath_data, sap->enfs_option); +++ +++ if (clp->cl_multipath_data == NULL && sap->enfs_option == NULL) +++ return true; +++ +++ if ((clp->cl_multipath_data == NULL && sap->enfs_option) || +++ (clp->cl_multipath_data && sap->enfs_option == NULL)) { +++ pr_err("not match client src %p dst %p\n.", +++ clp->cl_multipath_data, sap->enfs_option); +++ return false; +++ } +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->client_info_match != NULL) +++ ret = ops->client_info_match(clp->cl_multipath_data, +++ sap->enfs_option); +++ nfs_multipath_router_put(ops); +++ +++ return ret; +++} +++ +++int nfs4_multipath_client_match(struct nfs_client *src, struct nfs_client *dst) +++{ +++ int ret = true; +++ struct enfs_adapter_ops *ops; +++ +++ if (src->cl_multipath_data == NULL && dst->cl_multipath_data == NULL) +++ return true; +++ +++ if (src->cl_multipath_data == NULL || dst->cl_multipath_data == NULL) +++ return false; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->nfs4_client_info_match != NULL) +++ ret = ops->nfs4_client_info_match(src->cl_multipath_data, +++ src->cl_multipath_data); +++ nfs_multipath_router_put(ops); +++ +++ return ret; +++} +++EXPORT_SYMBOL_GPL(nfs4_multipath_client_match); +++ +++void nfs_multipath_show_client_info(struct seq_file *mount_option, +++ struct nfs_server *server) +++{ +++ struct enfs_adapter_ops *ops; +++ +++ if (mount_option == NULL || server == NULL || +++ server->client == NULL || +++ server->nfs_client->cl_multipath_data == NULL) +++ return; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->client_info_show != NULL) +++ ops->client_info_show(mount_option, server); +++ nfs_multipath_router_put(ops); +++} +++ +++int nfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option) +++{ +++ int ret = 0; +++ struct enfs_adapter_ops *ops; +++ +++ if (nfs_client == NULL || nfs_client->cl_rpcclient == NULL) +++ return 0; +++ +++ ops = nfs_multipath_router_get(); +++ if (ops != NULL && ops->remount_ip_list != NULL) +++ ret = ops->remount_ip_list(nfs_client, enfs_option); +++ nfs_multipath_router_put(ops); +++ return ret; +++} +++EXPORT_SYMBOL_GPL(nfs_remount_iplist); ++diff --git a/fs/nfs/enfs_adapter.h b/fs/nfs/enfs_adapter.h ++new file mode 100644 ++index 000000000000..752544e18056 ++--- /dev/null +++++ b/fs/nfs/enfs_adapter.h ++@@ -0,0 +1,101 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS adapt header. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _NFS_MULTIPATH_H_ +++#define _NFS_MULTIPATH_H_ +++ +++#include "internal.h" +++ +++#if IS_ENABLED(CONFIG_ENFS) +++enum nfsmultipathoptions { +++ REMOTEADDR, +++ LOCALADDR, +++ REMOTEDNSNAME, +++ REMOUNTREMOTEADDR, +++ REMOUNTLOCALADDR, +++ INVALID_OPTION +++}; +++ +++ +++struct enfs_adapter_ops { +++ const char *name; +++ struct module *owner; +++ int (*parse_mount_options)(enum nfsmultipathoptions option, +++ char *str, void **enfs_option, struct net *net_ns); +++ +++ void (*free_mount_options)(void **data); +++ +++ int (*client_info_init)(void **data, +++ const struct nfs_client_initdata *cl_init); +++ void (*client_info_free)(void *data); +++ int (*client_info_match)(void *src, void *dst); +++ int (*nfs4_client_info_match)(void *src, void *dst); +++ void (*client_info_show)(struct seq_file *mount_option, void *data); +++ int (*remount_ip_list)(struct nfs_client *nfs_client, +++ void *enfs_option); +++}; +++ +++int enfs_parse_mount_options(enum nfsmultipathoptions option, char *str, +++ struct nfs_parsed_mount_data *mnt); +++void enfs_free_mount_options(struct nfs_parsed_mount_data *data); +++int nfs_create_multi_path_client(struct nfs_client *client, +++ const struct nfs_client_initdata *cl_init); +++void nfs_free_multi_path_client(struct nfs_client *clp); +++int nfs_multipath_client_match(struct nfs_client *clp, +++ const struct nfs_client_initdata *sap); +++int nfs4_multipath_client_match(struct nfs_client *src, struct nfs_client *dst); +++void nfs_multipath_show_client_info(struct seq_file *mount_option, +++ struct nfs_server *server); +++int enfs_adapter_register(struct enfs_adapter_ops *ops); +++int enfs_adapter_unregister(struct enfs_adapter_ops *ops); +++int nfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option); +++int nfs4_create_multi_path(struct nfs_server *server, +++ struct nfs_parsed_mount_data *data, +++ const struct rpc_timeout *timeparms); +++ +++#else +++static inline +++void nfs_free_multi_path_client(struct nfs_client *clp) +++{ +++ +++} +++ +++static inline +++int nfs_multipath_client_match(struct nfs_client *clp, +++ const struct nfs_client_initdata *sap) +++{ +++ return 1; +++} +++ +++static inline +++int nfs_create_multi_path_client(struct nfs_client *client, +++ const struct nfs_client_initdata *cl_init) +++{ +++ return 0; +++} +++ +++static inline +++void nfs_multipath_show_client_info(struct seq_file *mount_option, +++ struct nfs_server *server) +++{ +++ +++} +++ +++static inline +++int nfs4_multipath_client_match(struct nfs_client *src, +++ struct nfs_client *dst) +++{ +++ return 1; +++} +++ +++static inline +++void enfs_free_mount_options(struct nfs_parsed_mount_data *data) +++{ +++ +++} +++ +++#endif // CONFIG_ENFS +++#endif // _NFS_MULTIPATH_H_ ++diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h ++index 0ce5a90640c4..c696693edc7b 100644 ++--- a/fs/nfs/internal.h +++++ b/fs/nfs/internal.h ++@@ -93,6 +93,9 @@ struct nfs_client_initdata { ++ u32 minorversion; ++ struct net *net; ++ const struct rpc_timeout *timeparms; +++#if IS_ENABLED(CONFIG_ENFS) +++ void *enfs_option; /* struct multipath_mount_options * */ +++#endif ++ }; ++ ++ /* ++@@ -135,6 +138,9 @@ struct nfs_parsed_mount_data { ++ ++ struct security_mnt_opts lsm_opts; ++ struct net *net; +++#if IS_ENABLED(CONFIG_ENFS) +++ void *enfs_option; /* struct multipath_mount_options * */ +++#endif ++ }; ++ ++ /* mount_clnt.c */ ++diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c ++index 1350ea673672..4aa6e1f961f7 100644 ++--- a/fs/nfs/nfs4client.c +++++ b/fs/nfs/nfs4client.c ++@@ -10,7 +10,7 @@ ++ #include <linux/sunrpc/xprt.h> ++ #include <linux/sunrpc/bc_xprt.h> ++ #include <linux/sunrpc/rpc_pipe_fs.h> ++-#include "internal.h" +++#include "enfs_adapter.h" ++ #include "callback.h" ++ #include "delegation.h" ++ #include "nfs4session.h" ++@@ -225,6 +225,16 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init) ++ __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags); ++ __set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags); ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ err = nfs_create_multi_path_client(clp, cl_init); +++ if (err < 0) { +++ dprintk("%s: create failed.%d\n", __func__, err); +++ nfs_put_client(clp); +++ clp = ERR_PTR(err); +++ return clp; +++ } +++#endif +++ ++ /* ++ * Set up the connection to the server before we add add to the ++ * global list. ++@@ -529,6 +539,9 @@ static int nfs4_match_client(struct nfs_client *pos, struct nfs_client *new, ++ if (!nfs4_match_client_owner_id(pos, new)) ++ return 1; ++ +++ if (!nfs4_multipath_client_match(pos, new)) +++ return 1; +++ ++ return 0; ++ } ++ ++@@ -860,7 +873,7 @@ static int nfs4_set_client(struct nfs_server *server, ++ const size_t addrlen, ++ const char *ip_addr, ++ int proto, const struct rpc_timeout *timeparms, ++- u32 minorversion, struct net *net) +++ u32 minorversion, struct net *net, void *enfs_option) ++ { ++ struct nfs_client_initdata cl_init = { ++ .hostname = hostname, ++@@ -872,6 +885,9 @@ static int nfs4_set_client(struct nfs_server *server, ++ .minorversion = minorversion, ++ .net = net, ++ .timeparms = timeparms, +++#if IS_ENABLED(CONFIG_ENFS) +++ .enfs_option = enfs_option, +++#endif ++ }; ++ struct nfs_client *clp; ++ ++@@ -1042,6 +1058,30 @@ static int nfs4_server_common_setup(struct nfs_server *server, ++ return error; ++ } ++ +++int nfs4_create_multi_path(struct nfs_server *server, +++ struct nfs_parsed_mount_data *data, +++ const struct rpc_timeout *timeparms) +++{ +++ struct nfs_client_initdata cl_init = { +++ .hostname = data->nfs_server.hostname, +++ .addr = (const struct sockaddr *)&data->nfs_server.address, +++ .addrlen = data->nfs_server.addrlen, +++ .ip_addr = data->client_address, +++ .nfs_mod = &nfs_v4, +++ .proto = data->nfs_server.protocol, +++ .minorversion = data->minorversion, +++ .net = data->net, +++ .timeparms = timeparms, +++#if IS_ENABLED(CONFIG_ENFS) +++ .enfs_option = data->enfs_option, +++#endif // CONFIG_ENFS +++ }; +++ +++ return nfs_create_multi_path_client(server->nfs_client, &cl_init); +++ +++} +++EXPORT_SYMBOL_GPL(nfs4_create_multi_path); +++ ++ /* ++ * Create a version 4 volume record ++ */ ++@@ -1050,6 +1090,7 @@ static int nfs4_init_server(struct nfs_server *server, ++ { ++ struct rpc_timeout timeparms; ++ int error; +++ void *enfs_option = NULL; ++ ++ nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, ++ data->timeo, data->retrans); ++@@ -1067,6 +1108,10 @@ static int nfs4_init_server(struct nfs_server *server, ++ else ++ data->selected_flavor = RPC_AUTH_UNIX; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ enfs_option = data->enfs_option; +++#endif +++ ++ /* Get a client record */ ++ error = nfs4_set_client(server, ++ data->nfs_server.hostname, ++@@ -1076,7 +1121,7 @@ static int nfs4_init_server(struct nfs_server *server, ++ data->nfs_server.protocol, ++ &timeparms, ++ data->minorversion, ++- data->net); +++ data->net, enfs_option); ++ if (error < 0) ++ return error; ++ ++@@ -1161,7 +1206,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, ++ XPRT_TRANSPORT_RDMA, ++ parent_server->client->cl_timeout, ++ parent_client->cl_mvops->minor_version, ++- parent_client->cl_net); +++ parent_client->cl_net, NULL); ++ if (!error) ++ goto init_server; ++ #endif /* IS_ENABLED(CONFIG_SUNRPC_XPRT_RDMA) */ ++@@ -1174,7 +1219,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, ++ XPRT_TRANSPORT_TCP, ++ parent_server->client->cl_timeout, ++ parent_client->cl_mvops->minor_version, ++- parent_client->cl_net); +++ parent_client->cl_net, NULL); ++ if (error < 0) ++ goto error; ++ ++@@ -1269,7 +1314,7 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname, ++ set_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status); ++ error = nfs4_set_client(server, hostname, sap, salen, buf, ++ clp->cl_proto, clnt->cl_timeout, ++- clp->cl_minorversion, net); +++ clp->cl_minorversion, net, NULL); ++ clear_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status); ++ if (error != 0) { ++ nfs_server_insert_lists(server); ++diff --git a/fs/nfs/super.c b/fs/nfs/super.c ++index a05e1eb2c3fd..83cd294aca15 100644 ++--- a/fs/nfs/super.c +++++ b/fs/nfs/super.c ++@@ -61,7 +61,7 @@ ++ #include "callback.h" ++ #include "delegation.h" ++ #include "iostat.h" ++-#include "internal.h" +++#include "enfs_adapter.h" ++ #include "fscache.h" ++ #include "nfs4session.h" ++ #include "pnfs.h" ++@@ -113,6 +113,12 @@ enum { ++ ++ /* Special mount options */ ++ Opt_userspace, Opt_deprecated, Opt_sloppy, +++#if IS_ENABLED(CONFIG_ENFS) +++ Opt_remote_iplist, +++ Opt_local_iplist, +++ Opt_remote_dnslist, +++ Opt_enfs_info, +++#endif ++ ++ Opt_err ++ }; ++@@ -183,6 +189,13 @@ static const match_table_t nfs_mount_option_tokens = { ++ { Opt_fscache_uniq, "fsc=%s" }, ++ { Opt_local_lock, "local_lock=%s" }, ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ { Opt_remote_iplist, "remoteaddrs=%s" }, +++ { Opt_local_iplist, "localaddrs=%s" }, +++ { Opt_remote_dnslist, "remotednsname=%s" }, +++ { Opt_enfs_info, "enfs_info=%s" }, +++#endif +++ ++ /* The following needs to be listed after all other options */ ++ { Opt_nfsvers, "v%s" }, ++ ++@@ -365,6 +378,21 @@ static struct shrinker acl_shrinker = { ++ .seeks = DEFAULT_SEEKS, ++ }; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++enum nfsmultipathoptions getNfsMultiPathOpt(int token) +++{ +++ switch (token) { +++ case Opt_remote_iplist: +++ return REMOUNTREMOTEADDR; +++ case Opt_local_iplist: +++ return REMOUNTLOCALADDR; +++ case Opt_remote_dnslist: +++ return REMOTEDNSNAME; +++ } +++ return INVALID_OPTION; +++} +++#endif +++ ++ /* ++ * Register the NFS filesystems ++ */ ++@@ -758,6 +786,9 @@ int nfs_show_options(struct seq_file *m, struct dentry *root) ++ seq_printf(m, ",addr=%s", ++ rpc_peeraddr2str(nfss->nfs_client->cl_rpcclient, ++ RPC_DISPLAY_ADDR)); +++ +++ nfs_multipath_show_client_info(m, nfss); +++ ++ rcu_read_unlock(); ++ ++ return 0; ++@@ -853,6 +884,8 @@ int nfs_show_stats(struct seq_file *m, struct dentry *root) ++ seq_puts(m, root->d_sb->s_flags & SB_NODIRATIME ? ",nodiratime" : ""); ++ nfs_show_mount_options(m, nfss, 1); ++ +++ nfs_multipath_show_client_info(m, nfss); +++ ++ seq_printf(m, "\n\tage:\t%lu", (jiffies - nfss->mount_time) / HZ); ++ ++ show_implementation_id(m, nfss); ++@@ -977,6 +1010,7 @@ static void nfs_free_parsed_mount_data(struct nfs_parsed_mount_data *data) ++ kfree(data->nfs_server.export_path); ++ kfree(data->nfs_server.hostname); ++ kfree(data->fscache_uniq); +++ enfs_free_mount_options(data); ++ security_free_mnt_opts(&data->lsm_opts); ++ kfree(data); ++ } ++@@ -1641,7 +1675,34 @@ static int nfs_parse_mount_options(char *raw, ++ return 0; ++ }; ++ break; ++- +++#if IS_ENABLED(CONFIG_ENFS) +++ case Opt_remote_iplist: +++ case Opt_local_iplist: +++ case Opt_remote_dnslist: +++ string = match_strdup(args); +++ if (string == NULL) +++ goto out_nomem; +++ rc = enfs_parse_mount_options(getNfsMultiPathOpt(token), +++ string, mnt); +++ kfree(string); +++ switch (rc) { +++ case 0: +++ break; +++ case -ENOMEM: +++ goto out_nomem; +++ case -ENOSPC: +++ goto out_limit; +++ case -EINVAL: +++ goto out_invalid_address; +++ case -ENOTSUPP: +++ goto out_invalid_address; +++ case -EOPNOTSUPP: +++ goto out_invalid_address; +++ } +++ break; +++ case Opt_enfs_info: +++ break; +++#endif ++ /* ++ * Special options ++ */ ++@@ -1720,6 +1781,11 @@ static int nfs_parse_mount_options(char *raw, ++ free_secdata(secdata); ++ printk(KERN_INFO "NFS: security options invalid: %d\n", rc); ++ return 0; +++#if IS_ENABLED(CONFIG_ENFS) +++out_limit: +++ dprintk("NFS: param is more than supported limit: %d\n", rc); +++ return 0; +++#endif ++ } ++ ++ /* ++@@ -2335,6 +2401,14 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) ++ if (!nfs_parse_mount_options((char *)options, data)) ++ goto out; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ if (data->enfs_option) { +++ error = nfs_remount_iplist(nfss->nfs_client, data->enfs_option); +++ if (error) +++ goto out; +++ } +++#endif +++ ++ /* ++ * noac is a special case. It implies -o sync, but that's not ++ * necessarily reflected in the mtab options. do_remount_sb ++@@ -2347,6 +2421,11 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) ++ /* compare new mount options with old ones */ ++ error = nfs_compare_remount_data(nfss, data); ++ out: +++#if IS_ENABLED(CONFIG_ENFS) +++ /* release remount option member */ +++ if (data->enfs_option) +++ enfs_free_mount_options(data); +++#endif ++ nfs_free_parsed_mount_data(data); ++ return error; ++ } ++diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h ++index 7023ae64e3d7..2c19678afe8d 100644 ++--- a/include/linux/nfs_fs_sb.h +++++ b/include/linux/nfs_fs_sb.h ++@@ -123,6 +123,11 @@ struct nfs_client { ++ ++ struct net *cl_net; ++ struct list_head pending_cb_stateids; +++ +++#if IS_ENABLED(CONFIG_ENFS) +++ /* multi path private structure (struct multipath_client_info *) */ +++ void *cl_multipath_data; +++#endif ++ }; ++ ++ /* +diff --git a/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch b/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch +new file mode 100644 +index 0000000..540a2ce +--- /dev/null ++++ b/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch +@@ -0,0 +1,805 @@ ++diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h ++index 8aa865bce4f6..89178f78de8c 100644 ++--- a/include/linux/sunrpc/clnt.h +++++ b/include/linux/sunrpc/clnt.h ++@@ -70,6 +70,10 @@ struct rpc_clnt { ++ struct dentry *cl_debugfs; /* debugfs directory */ ++ #endif ++ struct rpc_xprt_iter cl_xpi; +++ +++#if IS_ENABLED(CONFIG_ENFS) +++ bool cl_enfs; +++#endif ++ }; ++ ++ /* ++@@ -124,6 +128,9 @@ struct rpc_create_args { ++ unsigned long flags; ++ char *client_name; ++ struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */ +++#if IS_ENABLED(CONFIG_ENFS) +++ void *multipath_option; +++#endif ++ }; ++ ++ struct rpc_add_xprt_test { ++@@ -221,6 +228,12 @@ bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt, ++ const struct sockaddr *sap); ++ void rpc_cleanup_clids(void); ++ +++#if IS_ENABLED(CONFIG_ENFS) +++int +++rpc_clnt_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt, +++ const struct rpc_call_ops *ops, void *data, int flags); +++#endif /* CONFIG_ENFS */ +++ ++ static inline int rpc_reply_expected(struct rpc_task *task) ++ { ++ return (task->tk_msg.rpc_proc != NULL) && ++diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h ++index ad2e243f3f03..124f5a0faf3e 100644 ++--- a/include/linux/sunrpc/sched.h +++++ b/include/linux/sunrpc/sched.h ++@@ -90,6 +90,9 @@ struct rpc_task { ++ tk_garb_retry : 2, ++ tk_cred_retry : 2, ++ tk_rebind_retry : 2; +++#if IS_ENABLED(CONFIG_ENFS) +++ unsigned long tk_major_timeo; /* major timeout ticks */ +++#endif ++ }; ++ ++ typedef void (*rpc_action)(struct rpc_task *); ++@@ -118,6 +121,9 @@ struct rpc_task_setup { ++ */ ++ #define RPC_TASK_ASYNC 0x0001 /* is an async task */ ++ #define RPC_TASK_SWAPPER 0x0002 /* is swapping in/out */ +++#if IS_ENABLED(CONFIG_ENFS) +++#define RPC_TASK_FIXED 0x0004 /* detect xprt status task */ +++#endif ++ #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ ++ #define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ ++ #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ ++@@ -257,6 +263,9 @@ void rpc_destroy_mempool(void); ++ extern struct workqueue_struct *rpciod_workqueue; ++ extern struct workqueue_struct *xprtiod_workqueue; ++ void rpc_prepare_task(struct rpc_task *task); +++#if IS_ENABLED(CONFIG_ENFS) +++void rpc_init_task_retry_counters(struct rpc_task *task); +++#endif ++ ++ static inline int rpc_wait_for_completion_task(struct rpc_task *task) ++ { ++diff --git a/include/linux/sunrpc/sunrpc_enfs_adapter.h b/include/linux/sunrpc/sunrpc_enfs_adapter.h ++new file mode 100644 ++index 000000000000..28abedcf5cf6 ++--- /dev/null +++++ b/include/linux/sunrpc/sunrpc_enfs_adapter.h ++@@ -0,0 +1,128 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* Client-side SUNRPC ENFS adapter header. +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _SUNRPC_ENFS_ADAPTER_H_ +++#define _SUNRPC_ENFS_ADAPTER_H_ +++#include <linux/sunrpc/clnt.h> +++ +++#if IS_ENABLED(CONFIG_ENFS) +++ +++static inline void rpc_xps_nactive_add_one(struct rpc_xprt_switch *xps) +++{ +++ xps->xps_nactive--; +++} +++ +++static inline void rpc_xps_nactive_sub_one(struct rpc_xprt_switch *xps) +++{ +++ xps->xps_nactive--; +++} +++ +++struct rpc_xprt *rpc_task_get_xprt +++(struct rpc_clnt *clnt, struct rpc_xprt *xprt); +++ +++struct rpc_multipath_ops { +++ struct module *owner; +++ void (*create_clnt)(struct rpc_create_args *args, +++ struct rpc_clnt *clnt); +++ void (*releas_clnt)(struct rpc_clnt *clnt); +++ void (*create_xprt)(struct rpc_xprt *xprt); +++ void (*destroy_xprt)(struct rpc_xprt *xprt); +++ void (*xprt_iostat)(struct rpc_task *task); +++ void (*failover_handle)(struct rpc_task *task); +++ bool (*task_need_call_start_again)(struct rpc_task *task); +++ void (*adjust_task_timeout)(struct rpc_task *task, void *condition); +++ void (*init_task_req)(struct rpc_task *task, struct rpc_rqst *req); +++ bool (*prepare_transmit)(struct rpc_task *task); +++}; +++ +++extern struct rpc_multipath_ops __rcu *multipath_ops; +++void rpc_init_task_retry_counters(struct rpc_task *task); +++int rpc_multipath_ops_register(struct rpc_multipath_ops *ops); +++int rpc_multipath_ops_unregister(struct rpc_multipath_ops *ops); +++struct rpc_multipath_ops *rpc_multipath_ops_get(void); +++void rpc_multipath_ops_put(struct rpc_multipath_ops *ops); +++void rpc_task_release_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt); +++void rpc_multipath_ops_create_clnt(struct rpc_create_args *args, +++ struct rpc_clnt *clnt); +++void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt); +++bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt); +++void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt); +++void rpc_multipath_ops_xprt_iostat(struct rpc_task *task); +++void rpc_multipath_ops_failover_handle(struct rpc_task *task); +++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task); +++void rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, +++ void *condition); +++void rpc_multipath_ops_init_task_req(struct rpc_task *task, +++ struct rpc_rqst *req); +++bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task); +++ +++#else +++static inline struct rpc_xprt *rpc_task_get_xprt(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt) +++{ +++ return NULL; +++} +++ +++static inline void rpc_task_release_xprt(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt) +++{ +++} +++ +++static inline void rpc_xps_nactive_add_one(struct rpc_xprt_switch *xps) +++{ +++} +++ +++static inline void rpc_xps_nactive_sub_one(struct rpc_xprt_switch *xps) +++{ +++} +++ +++static inline void rpc_multipath_ops_create_clnt +++(struct rpc_create_args *args, struct rpc_clnt *clnt) +++{ +++} +++ +++static inline void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt) +++{ +++} +++ +++static inline bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt) +++{ +++ return false; +++} +++ +++static inline void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt) +++{ +++} +++ +++static inline void rpc_multipath_ops_xprt_iostat(struct rpc_task *task) +++{ +++} +++ +++static inline void rpc_multipath_ops_failover_handle(struct rpc_task *task) +++{ +++} +++ +++static inline +++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task) +++{ +++ return false; +++} +++ +++static inline void +++rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, void *condition) +++{ +++} +++ +++static inline void +++rpc_multipath_ops_init_task_req(struct rpc_task *task, struct rpc_rqst *req) +++{ +++} +++ +++static inline bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task) +++{ +++ return false; +++} +++ +++#endif +++#endif // _SUNRPC_ENFS_ADAPTER_H_ ++diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h ++index ccfacca1eba9..2e47b3577947 100644 ++--- a/include/linux/sunrpc/xprt.h +++++ b/include/linux/sunrpc/xprt.h ++@@ -279,6 +279,10 @@ struct rpc_xprt { ++ atomic_t inject_disconnect; ++ #endif ++ struct rcu_head rcu; +++#if IS_ENABLED(CONFIG_ENFS) +++ atomic_long_t queuelen; +++ void *multipath_context; +++#endif ++ }; ++ ++ #if defined(CONFIG_SUNRPC_BACKCHANNEL) ++diff --git a/include/linux/sunrpc/xprtmultipath.h b/include/linux/sunrpc/xprtmultipath.h ++index af1257c030d2..d54e4dbbbf34 100644 ++--- a/include/linux/sunrpc/xprtmultipath.h +++++ b/include/linux/sunrpc/xprtmultipath.h ++@@ -22,6 +22,10 @@ struct rpc_xprt_switch { ++ const struct rpc_xprt_iter_ops *xps_iter_ops; ++ ++ struct rcu_head xps_rcu; +++#if IS_ENABLED(CONFIG_ENFS) +++ unsigned int xps_nactive; +++ atomic_long_t xps_queuelen; +++#endif ++ }; ++ ++ struct rpc_xprt_iter { ++@@ -69,4 +73,8 @@ extern struct rpc_xprt *xprt_iter_get_next(struct rpc_xprt_iter *xpi); ++ ++ extern bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps, ++ const struct sockaddr *sap); +++#if IS_ENABLED(CONFIG_ENFS) +++extern void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, +++ struct rpc_xprt *xprt); +++#endif ++ #endif ++diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c ++index 0fc540b0d183..d7ffee637148 100644 ++--- a/net/sunrpc/clnt.c +++++ b/net/sunrpc/clnt.c ++@@ -37,6 +37,7 @@ ++ #include <linux/sunrpc/rpc_pipe_fs.h> ++ #include <linux/sunrpc/metrics.h> ++ #include <linux/sunrpc/bc_xprt.h> +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ #include <trace/events/sunrpc.h> ++ ++ #include "sunrpc.h" ++@@ -490,6 +491,8 @@ static struct rpc_clnt *rpc_create_xprt(struct rpc_create_args *args, ++ } ++ } ++ +++ rpc_multipath_ops_create_clnt(args, clnt); +++ ++ clnt->cl_softrtry = 1; ++ if (args->flags & RPC_CLNT_CREATE_HARDRTRY) ++ clnt->cl_softrtry = 0; ++@@ -869,6 +872,8 @@ void rpc_shutdown_client(struct rpc_clnt *clnt) ++ list_empty(&clnt->cl_tasks), 1*HZ); ++ } ++ +++ rpc_multipath_ops_releas_clnt(clnt); +++ ++ rpc_release_client(clnt); ++ } ++ EXPORT_SYMBOL_GPL(rpc_shutdown_client); ++@@ -981,7 +986,13 @@ void rpc_task_release_transport(struct rpc_task *task) ++ ++ if (xprt) { ++ task->tk_xprt = NULL; ++- xprt_put(xprt); +++#if IS_ENABLED(CONFIG_ENFS) +++ if (task->tk_client) { +++ rpc_task_release_xprt(task->tk_client, xprt); +++ return; +++ } +++#endif +++ xprt_put(xprt); ++ } ++ } ++ EXPORT_SYMBOL_GPL(rpc_task_release_transport); ++@@ -990,6 +1001,10 @@ void rpc_task_release_client(struct rpc_task *task) ++ { ++ struct rpc_clnt *clnt = task->tk_client; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ rpc_task_release_transport(task); +++#endif +++ ++ if (clnt != NULL) { ++ /* Remove from client task list */ ++ spin_lock(&clnt->cl_lock); ++@@ -999,14 +1014,29 @@ void rpc_task_release_client(struct rpc_task *task) ++ ++ rpc_release_client(clnt); ++ } +++#if IS_ENABLED(CONFIG_ENFS) +++#else ++ rpc_task_release_transport(task); +++#endif ++ } ++ +++#if IS_ENABLED(CONFIG_ENFS) +++static struct rpc_xprt * +++rpc_task_get_next_xprt(struct rpc_clnt *clnt) +++{ +++ return rpc_task_get_xprt(clnt, xprt_iter_get_next(&clnt->cl_xpi)); +++} +++#endif +++ ++ static ++ void rpc_task_set_transport(struct rpc_task *task, struct rpc_clnt *clnt) ++ { ++ if (!task->tk_xprt) +++#if IS_ENABLED(CONFIG_ENFS) +++ task->tk_xprt = rpc_task_get_next_xprt(clnt); +++#else ++ task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi); +++#endif ++ } ++ ++ static ++@@ -1597,6 +1627,14 @@ call_reserveresult(struct rpc_task *task) ++ return; ++ case -EIO: /* probably a shutdown */ ++ break; +++#if IS_ENABLED(CONFIG_ENFS) +++ case -ETIMEDOUT: /* woken up; restart */ +++ if (rpc_multipath_ops_task_need_call_start_again(task)) { +++ rpc_task_release_transport(task); +++ task->tk_action = call_start; +++ return; +++ } +++#endif ++ default: ++ printk(KERN_ERR "%s: unrecognized error %d, exiting\n", ++ __func__, status); ++@@ -1962,6 +2000,10 @@ call_transmit(struct rpc_task *task) ++ return; ++ if (!xprt_prepare_transmit(task)) ++ return; +++ +++ if (rpc_multipath_ops_prepare_transmit(task)) +++ return; +++ ++ task->tk_action = call_transmit_status; ++ /* Encode here so that rpcsec_gss can use correct sequence number. */ ++ if (rpc_task_need_encode(task)) { ++@@ -2277,6 +2319,9 @@ call_timeout(struct rpc_task *task) ++ ++ retry: ++ task->tk_action = call_bind; +++#if IS_ENABLED(CONFIG_ENFS) +++ rpc_multipath_ops_failover_handle(task); +++#endif ++ task->tk_status = 0; ++ } ++ ++@@ -2961,3 +3006,30 @@ rpc_clnt_swap_deactivate(struct rpc_clnt *clnt) ++ } ++ EXPORT_SYMBOL_GPL(rpc_clnt_swap_deactivate); ++ #endif /* CONFIG_SUNRPC_SWAP */ +++ +++#if IS_ENABLED(CONFIG_ENFS) +++/* rpc_clnt_test_xprt - Test and add a new transport to a rpc_clnt +++ * @clnt: pointer to struct rpc_clnt +++ * @xprt: pointer struct rpc_xprt +++ * @ops: async operation +++ */ +++int +++rpc_clnt_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt, +++ const struct rpc_call_ops *ops, void *data, int flags) +++{ +++ struct rpc_cred *cred; +++ struct rpc_task *task; +++ +++ cred = authnull_ops.lookup_cred(NULL, NULL, 0); +++ task = rpc_call_null_helper(clnt, xprt, cred, +++ RPC_TASK_SOFT | RPC_TASK_SOFTCONN | flags, +++ ops, data); +++ put_rpccred(cred); +++ if (IS_ERR(task)) +++ return PTR_ERR(task); +++ +++ rpc_put_task(task); +++ return 1; +++} +++EXPORT_SYMBOL_GPL(rpc_clnt_test_xprt); +++#endif ++diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c ++index a873c92a4898..2254fea0e863 100644 ++--- a/net/sunrpc/sched.c +++++ b/net/sunrpc/sched.c ++@@ -20,7 +20,7 @@ ++ #include <linux/mutex.h> ++ #include <linux/freezer.h> ++ ++-#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ ++ #include "sunrpc.h" ++ ++@@ -962,7 +962,12 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta ++ /* Initialize workqueue for async tasks */ ++ task->tk_workqueue = task_setup_data->workqueue; ++ +++#if IS_ENABLED(CONFIG_ENFS) +++ task->tk_xprt = rpc_task_get_xprt(task_setup_data->rpc_client, +++ xprt_get(task_setup_data->rpc_xprt)); +++#else ++ task->tk_xprt = xprt_get(task_setup_data->rpc_xprt); +++#endif ++ ++ if (task->tk_ops->rpc_call_prepare != NULL) ++ task->tk_action = rpc_prepare_task; ++diff --git a/net/sunrpc/sunrpc_enfs_adapter.c b/net/sunrpc/sunrpc_enfs_adapter.c ++new file mode 100644 ++index 000000000000..c1543545c6de ++--- /dev/null +++++ b/net/sunrpc/sunrpc_enfs_adapter.c ++@@ -0,0 +1,214 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* Client-side SUNRPC ENFS adapter header. +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> +++ +++struct rpc_multipath_ops __rcu *multipath_ops; +++ +++void rpc_init_task_retry_counters(struct rpc_task *task) +++{ +++ /* Initialize retry counters */ +++ task->tk_garb_retry = 2; +++ task->tk_cred_retry = 2; +++ task->tk_rebind_retry = 2; +++} +++EXPORT_SYMBOL_GPL(rpc_init_task_retry_counters); +++ +++struct rpc_xprt * +++rpc_task_get_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) +++{ +++ struct rpc_xprt_switch *xps; +++ +++ if (!xprt) +++ return NULL; +++ rcu_read_lock(); +++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); +++ atomic_long_inc(&xps->xps_queuelen); +++ rcu_read_unlock(); +++ atomic_long_inc(&xprt->queuelen); +++ +++ return xprt; +++} +++ +++int rpc_multipath_ops_register(struct rpc_multipath_ops *ops) +++{ +++ struct rpc_multipath_ops *old; +++ +++ old = cmpxchg((struct rpc_multipath_ops **)&multipath_ops, NULL, ops); +++ if (!old || old == ops) +++ return 0; +++ pr_err("regist rpc_multipath ops %p fail. old %p\n", ops, old); +++ return -EPERM; +++} +++EXPORT_SYMBOL_GPL(rpc_multipath_ops_register); +++ +++int rpc_multipath_ops_unregister(struct rpc_multipath_ops *ops) +++{ +++ struct rpc_multipath_ops *old; +++ +++ old = cmpxchg((struct rpc_multipath_ops **)&multipath_ops, ops, NULL); +++ if (!old || old == ops) +++ return 0; +++ pr_err("regist rpc_multipath ops %p fail. old %p\n", ops, old); +++ return -EPERM; +++} +++EXPORT_SYMBOL_GPL(rpc_multipath_ops_unregister); +++ +++struct rpc_multipath_ops *rpc_multipath_ops_get(void) +++{ +++ struct rpc_multipath_ops *ops; +++ +++ rcu_read_lock(); +++ ops = rcu_dereference(multipath_ops); +++ if (!ops) { +++ rcu_read_unlock(); +++ return NULL; +++ } +++ if (!try_module_get(ops->owner)) +++ ops = NULL; +++ rcu_read_unlock(); +++ return ops; +++} +++EXPORT_SYMBOL_GPL(rpc_multipath_ops_get); +++ +++void rpc_multipath_ops_put(struct rpc_multipath_ops *ops) +++{ +++ if (ops) +++ module_put(ops->owner); +++} +++EXPORT_SYMBOL_GPL(rpc_multipath_ops_put); +++ +++void rpc_task_release_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) +++{ +++ struct rpc_xprt_switch *xps; +++ +++ atomic_long_dec(&xprt->queuelen); +++ rcu_read_lock(); +++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); +++ atomic_long_dec(&xps->xps_queuelen); +++ rcu_read_unlock(); +++ +++ xprt_put(xprt); +++} +++ +++void rpc_multipath_ops_create_clnt(struct rpc_create_args *args, +++ struct rpc_clnt *clnt) +++{ +++ struct rpc_multipath_ops *mops; +++ +++ if (args->multipath_option) { +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->create_clnt) +++ mops->create_clnt(args, clnt); +++ rpc_multipath_ops_put(mops); +++ } +++} +++ +++void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt) +++{ +++ struct rpc_multipath_ops *mops; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->releas_clnt) +++ mops->releas_clnt(clnt); +++ +++ rpc_multipath_ops_put(mops); +++} +++ +++bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt) +++{ +++ struct rpc_multipath_ops *mops = NULL; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->create_xprt) { +++ mops->create_xprt(xprt); +++ if (!xprt->multipath_context) { +++ rpc_multipath_ops_put(mops); +++ return true; +++ } +++ } +++ rpc_multipath_ops_put(mops); +++ return false; +++} +++ +++void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt) +++{ +++ struct rpc_multipath_ops *mops; +++ +++ if (xprt->multipath_context) { +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->destroy_xprt) +++ mops->destroy_xprt(xprt); +++ rpc_multipath_ops_put(mops); +++ } +++} +++ +++void rpc_multipath_ops_xprt_iostat(struct rpc_task *task) +++{ +++ struct rpc_multipath_ops *mops; +++ +++ mops = rpc_multipath_ops_get(); +++ if (task->tk_client && mops && mops->xprt_iostat) +++ mops->xprt_iostat(task); +++ rpc_multipath_ops_put(mops); +++} +++ +++void rpc_multipath_ops_failover_handle(struct rpc_task *task) +++{ +++ struct rpc_multipath_ops *mpath_ops = NULL; +++ +++ mpath_ops = rpc_multipath_ops_get(); +++ if (mpath_ops && mpath_ops->failover_handle) +++ mpath_ops->failover_handle(task); +++ rpc_multipath_ops_put(mpath_ops); +++} +++ +++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task) +++{ +++ struct rpc_multipath_ops *mpath_ops = NULL; +++ bool ret = false; +++ +++ mpath_ops = rpc_multipath_ops_get(); +++ if (mpath_ops && mpath_ops->task_need_call_start_again) +++ ret = mpath_ops->task_need_call_start_again(task); +++ rpc_multipath_ops_put(mpath_ops); +++ return ret; +++} +++ +++void rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, +++ void *condition) +++{ +++ struct rpc_multipath_ops *mops = NULL; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->adjust_task_timeout) +++ mops->adjust_task_timeout(task, NULL); +++ rpc_multipath_ops_put(mops); +++} +++ +++void rpc_multipath_ops_init_task_req(struct rpc_task *task, +++ struct rpc_rqst *req) +++{ +++ struct rpc_multipath_ops *mops = NULL; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->init_task_req) +++ mops->init_task_req(task, req); +++ rpc_multipath_ops_put(mops); +++} +++ +++bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task) +++{ +++ struct rpc_multipath_ops *mops = NULL; +++ +++ mops = rpc_multipath_ops_get(); +++ if (mops && mops->prepare_transmit) { +++ if (!(mops->prepare_transmit(task))) { +++ rpc_multipath_ops_put(mops); +++ return true; +++ } +++ } +++ rpc_multipath_ops_put(mops); +++ return false; +++} ++diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c ++index c912bf20faa2..c2b63b3d5217 100644 ++--- a/net/sunrpc/xprt.c +++++ b/net/sunrpc/xprt.c ++@@ -48,6 +48,7 @@ ++ #include <linux/sunrpc/clnt.h> ++ #include <linux/sunrpc/metrics.h> ++ #include <linux/sunrpc/bc_xprt.h> +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ #include <linux/rcupdate.h> ++ ++ #include <trace/events/sunrpc.h> ++@@ -259,6 +260,9 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) ++ dprintk("RPC: %5u failed to lock transport %p\n", ++ task->tk_pid, xprt); ++ task->tk_timeout = 0; +++ +++ rpc_multipath_ops_adjust_task_timeout(task, NULL); +++ ++ task->tk_status = -EAGAIN; ++ if (req == NULL) ++ priority = RPC_PRIORITY_LOW; ++@@ -560,6 +564,9 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action) ++ struct rpc_xprt *xprt = req->rq_xprt; ++ ++ task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; +++ +++ rpc_multipath_ops_adjust_task_timeout(task, NULL); +++ ++ rpc_sleep_on(&xprt->pending, task, action); ++ } ++ EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); ++@@ -1347,6 +1354,9 @@ xprt_request_init(struct rpc_task *task) ++ req->rq_rcv_buf.buflen = 0; ++ req->rq_release_snd_buf = NULL; ++ xprt_reset_majortimeo(req); +++ +++ rpc_multipath_ops_init_task_req(task, req); +++ ++ dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid, ++ req, ntohl(req->rq_xid)); ++ } ++@@ -1427,6 +1437,9 @@ void xprt_release(struct rpc_task *task) ++ task->tk_ops->rpc_count_stats(task, task->tk_calldata); ++ else if (task->tk_client) ++ rpc_count_iostats(task, task->tk_client->cl_metrics); +++ +++ rpc_multipath_ops_xprt_iostat(task); +++ ++ spin_lock(&xprt->recv_lock); ++ if (!list_empty(&req->rq_list)) { ++ list_del_init(&req->rq_list); ++@@ -1455,6 +1468,7 @@ void xprt_release(struct rpc_task *task) ++ else ++ xprt_free_bc_request(req); ++ } +++EXPORT_SYMBOL_GPL(xprt_release); ++ ++ static void xprt_init(struct rpc_xprt *xprt, struct net *net) ++ { ++@@ -1528,6 +1542,10 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args) ++ return ERR_PTR(-ENOMEM); ++ } ++ +++if (rpc_multipath_ops_create_xprt(xprt)) { +++ xprt_destroy(xprt); +++ return ERR_PTR(-ENOMEM); +++} ++ rpc_xprt_debugfs_register(xprt); ++ ++ dprintk("RPC: created transport %p with %u slots\n", xprt, ++@@ -1547,6 +1565,9 @@ static void xprt_destroy_cb(struct work_struct *work) ++ rpc_destroy_wait_queue(&xprt->sending); ++ rpc_destroy_wait_queue(&xprt->backlog); ++ kfree(xprt->servername); +++ +++ rpc_multipath_ops_destroy_xprt(xprt); +++ ++ /* ++ * Tear down transport state and free the rpc_xprt ++ */ ++diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c ++index 6ebaa58b4eff..6202a0be1327 100644 ++--- a/net/sunrpc/xprtmultipath.c +++++ b/net/sunrpc/xprtmultipath.c ++@@ -18,6 +18,7 @@ ++ #include <linux/sunrpc/xprt.h> ++ #include <linux/sunrpc/addr.h> ++ #include <linux/sunrpc/xprtmultipath.h> +++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ ++ typedef struct rpc_xprt *(*xprt_switch_find_xprt_t)(struct list_head *head, ++ const struct rpc_xprt *cur); ++@@ -26,8 +27,8 @@ static const struct rpc_xprt_iter_ops rpc_xprt_iter_singular; ++ static const struct rpc_xprt_iter_ops rpc_xprt_iter_roundrobin; ++ static const struct rpc_xprt_iter_ops rpc_xprt_iter_listall; ++ ++-static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, ++- struct rpc_xprt *xprt) +++void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, +++ struct rpc_xprt *xprt) ++ { ++ if (unlikely(xprt_get(xprt) == NULL)) ++ return; ++@@ -36,7 +37,9 @@ static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, ++ if (xps->xps_nxprts == 0) ++ xps->xps_net = xprt->xprt_net; ++ xps->xps_nxprts++; +++ rpc_xps_nactive_add_one(xps); ++ } +++EXPORT_SYMBOL(xprt_switch_add_xprt_locked); ++ ++ /** ++ * rpc_xprt_switch_add_xprt - Add a new rpc_xprt to an rpc_xprt_switch ++@@ -63,6 +66,7 @@ static void xprt_switch_remove_xprt_locked(struct rpc_xprt_switch *xps, ++ if (unlikely(xprt == NULL)) ++ return; ++ xps->xps_nxprts--; +++ rpc_xps_nactive_sub_one(xps); ++ if (xps->xps_nxprts == 0) ++ xps->xps_net = NULL; ++ smp_wmb(); ++@@ -84,7 +88,7 @@ void rpc_xprt_switch_remove_xprt(struct rpc_xprt_switch *xps, ++ spin_unlock(&xps->xps_lock); ++ xprt_put(xprt); ++ } ++- +++EXPORT_SYMBOL(rpc_xprt_switch_remove_xprt); ++ /** ++ * xprt_switch_alloc - Allocate a new struct rpc_xprt_switch ++ * @xprt: pointer to struct rpc_xprt ++@@ -102,7 +106,13 @@ struct rpc_xprt_switch *xprt_switch_alloc(struct rpc_xprt *xprt, ++ if (xps != NULL) { ++ spin_lock_init(&xps->xps_lock); ++ kref_init(&xps->xps_kref); +++#if IS_ENABLED(CONFIG_ENFS) +++ xps->xps_nxprts = 0; +++ xps->xps_nactive = 0; +++ atomic_long_set(&xps->xps_queuelen, 0); +++#else ++ xps->xps_nxprts = 0; +++#endif ++ INIT_LIST_HEAD(&xps->xps_xprt_list); ++ xps->xps_iter_ops = &rpc_xprt_iter_singular; ++ xprt_switch_add_xprt_locked(xps, xprt); ++@@ -148,6 +158,7 @@ struct rpc_xprt_switch *xprt_switch_get(struct rpc_xprt_switch *xps) ++ return xps; ++ return NULL; ++ } +++EXPORT_SYMBOL(xprt_switch_get); ++ ++ /** ++ * xprt_switch_put - Release a reference to a rpc_xprt_switch ++@@ -160,6 +171,7 @@ void xprt_switch_put(struct rpc_xprt_switch *xps) ++ if (xps != NULL) ++ kref_put(&xps->xps_kref, xprt_switch_free); ++ } +++EXPORT_SYMBOL(xprt_switch_put); ++ ++ /** ++ * rpc_xprt_switch_set_roundrobin - Set a round-robin policy on rpc_xprt_switch +diff --git a/0003-add_enfs_module_for_nfs_mount_option.patch b/0003-add_enfs_module_for_nfs_mount_option.patch +new file mode 100644 +index 0000000..70753b5 +--- /dev/null ++++ b/0003-add_enfs_module_for_nfs_mount_option.patch +@@ -0,0 +1,1209 @@ ++diff --git a/fs/nfs/enfs/Makefile b/fs/nfs/enfs/Makefile ++new file mode 100644 ++index 000000000000..6e83eb23c668 ++--- /dev/null +++++ b/fs/nfs/enfs/Makefile ++@@ -0,0 +1,18 @@ +++obj-m += enfs.o +++ +++#EXTRA_CFLAGS += -I$(PWD)/.. +++ +++enfs-y := enfs_init.o +++enfs-y += enfs_config.o +++enfs-y += mgmt_init.o +++enfs-y += enfs_multipath_client.o +++enfs-y += enfs_multipath_parse.o +++enfs-y += failover_path.o +++enfs-y += failover_time.o +++enfs-y += enfs_roundrobin.o +++enfs-y += enfs_multipath.o +++enfs-y += enfs_path.o +++enfs-y += enfs_proc.o +++enfs-y += enfs_remount.o +++enfs-y += pm_ping.o +++enfs-y += pm_state.o ++diff --git a/fs/nfs/enfs/enfs.h b/fs/nfs/enfs/enfs.h ++new file mode 100644 ++index 000000000000..be3d95220088 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs.h ++@@ -0,0 +1,62 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS multipath adapt header. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++ +++#ifndef _ENFS_H_ +++#define _ENFS_H_ +++#include <linux/atomic.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs3.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include "../enfs_adapter.h" +++ +++#define IP_ADDRESS_LEN_MAX 64 +++#define MAX_IP_PAIR_PER_MOUNT 8 +++#define MAX_IP_INDEX (MAX_IP_PAIR_PER_MOUNT) +++#define MAX_SUPPORTED_LOCAL_IP_COUNT 8 +++#define MAX_SUPPORTED_REMOTE_IP_COUNT 32 +++#define MAX_DNS_NAME_LEN 512 +++#define MAX_DNS_SUPPORTED 2 +++#define EXTEND_CMD_MAX_BUF_LEN 65356 +++ +++ +++struct nfs_ip_list { +++ int count; +++ struct sockaddr_storage address[MAX_SUPPORTED_REMOTE_IP_COUNT]; +++ size_t addrlen[MAX_SUPPORTED_REMOTE_IP_COUNT]; +++}; +++ +++struct NFS_ROUTE_DNS_S { +++ char dnsname[MAX_DNS_NAME_LEN]; // valid only if dnsExist is true +++}; +++ +++struct NFS_ROUTE_DNS_INFO_S { +++ int dnsNameCount; // Count of DNS name in the list +++ // valid only if dnsExist is true +++ struct NFS_ROUTE_DNS_S routeRemoteDnsList[MAX_DNS_SUPPORTED]; +++}; +++ +++struct rpc_iostats; +++struct enfs_xprt_context { +++ struct sockaddr_storage srcaddr; +++ struct rpc_iostats *stats; +++ bool main; +++ atomic_t path_state; +++ atomic_t path_check_state; +++}; +++ +++static inline bool enfs_is_main_xprt(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx = xprt->multipath_context; +++ +++ if (!ctx) +++ return false; +++ return ctx->main; +++} +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_init.c b/fs/nfs/enfs/enfs_init.c ++new file mode 100644 ++index 000000000000..4b55608191a7 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_init.c ++@@ -0,0 +1,98 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/module.h> +++#include <linux/sunrpc/sched.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs3.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include "enfs.h" +++#include "enfs_multipath_parse.h" +++#include "enfs_multipath_client.h" +++#include "enfs_remount.h" +++#include "init.h" +++#include "enfs_log.h" +++#include "enfs_multipath.h" +++#include "mgmt_init.h" +++ +++struct enfs_adapter_ops enfs_adapter = { +++ .name = "enfs", +++ .owner = THIS_MODULE, +++ .parse_mount_options = nfs_multipath_parse_options, +++ .free_mount_options = nfs_multipath_free_options, +++ .client_info_init = nfs_multipath_client_info_init, +++ .client_info_free = nfs_multipath_client_info_free, +++ .client_info_match = nfs_multipath_client_info_match, +++ .client_info_show = nfs_multipath_client_info_show, +++ .remount_ip_list = enfs_remount_iplist, +++}; +++ +++int32_t enfs_init(void) +++{ +++ int err; +++ +++ err = enfs_multipath_init(); +++ if (err) { +++ enfs_log_error("init multipath failed.\n"); +++ goto out; +++ } +++ +++ err = mgmt_init(); +++ if (err != 0) { +++ enfs_log_error("init mgmt failed.\n"); +++ goto out_tp_exit; +++ } +++ +++ return 0; +++ +++out_tp_exit: +++ enfs_multipath_exit(); +++out: +++ return err; +++} +++ +++void enfs_fini(void) +++{ +++ mgmt_fini(); +++ +++ enfs_multipath_exit(); +++} +++ +++static int __init init_enfs(void) +++{ +++ int ret; +++ +++ ret = enfs_adapter_register(&enfs_adapter); +++ if (ret) { +++ pr_err("regist enfs_adapter fail. ret %d\n", ret); +++ return -1; +++ } +++ +++ ret = enfs_init(); +++ if (ret) { +++ enfs_adapter_unregister(&enfs_adapter); +++ return -1; +++ } +++ +++ return 0; +++} +++ +++static void __exit exit_enfs(void) +++{ +++ enfs_fini(); +++ enfs_adapter_unregister(&enfs_adapter); +++} +++ +++MODULE_LICENSE("GPL"); +++MODULE_AUTHOR("Huawei Tech. Co., Ltd."); +++MODULE_DESCRIPTION("Nfs client router"); +++MODULE_VERSION("1.0"); +++ +++module_init(init_enfs); +++module_exit(exit_enfs); ++diff --git a/fs/nfs/enfs/enfs_multipath_client.c b/fs/nfs/enfs/enfs_multipath_client.c ++new file mode 100644 ++index 000000000000..63c02898a42c ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_client.c ++@@ -0,0 +1,340 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/types.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include <linux/proc_fs.h> +++#include <linux/seq_file.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/addr.h> +++#include "enfs_multipath_client.h" +++#include "enfs_multipath_parse.h" +++ +++int +++nfs_multipath_client_mount_info_init(struct multipath_client_info *client_info, +++ const struct nfs_client_initdata *client_init_data) +++{ +++ struct multipath_mount_options *mount_options = +++ (struct multipath_mount_options *)client_init_data->enfs_option; +++ +++ if (mount_options->local_ip_list) { +++ client_info->local_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ +++ if (!client_info->local_ip_list) +++ return -ENOMEM; +++ +++ memcpy(client_info->local_ip_list, mount_options->local_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (mount_options->remote_ip_list) { +++ +++ client_info->remote_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ +++ if (!client_info->remote_ip_list) { +++ kfree(client_info->local_ip_list); +++ client_info->local_ip_list = NULL; +++ return -ENOMEM; +++ } +++ memcpy(client_info->remote_ip_list, +++ mount_options->remote_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (mount_options->pRemoteDnsInfo) { +++ client_info->pRemoteDnsInfo = +++ kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), GFP_KERNEL); +++ +++ if (!client_info->pRemoteDnsInfo) { +++ kfree(client_info->local_ip_list); +++ client_info->local_ip_list = NULL; +++ kfree(client_info->remote_ip_list); +++ client_info->remote_ip_list = NULL; +++ return -ENOMEM; +++ } +++ memcpy(client_info->pRemoteDnsInfo, +++ mount_options->pRemoteDnsInfo, +++ sizeof(struct NFS_ROUTE_DNS_INFO_S)); +++ } +++ return 0; +++} +++ +++void nfs_multipath_client_info_free_work(struct work_struct *work) +++{ +++ +++ struct multipath_client_info *clp_info; +++ +++ if (work == NULL) +++ return; +++ +++ clp_info = container_of(work, struct multipath_client_info, work); +++ +++ if (clp_info->local_ip_list != NULL) { +++ kfree(clp_info->local_ip_list); +++ clp_info->local_ip_list = NULL; +++ } +++ if (clp_info->remote_ip_list != NULL) { +++ kfree(clp_info->remote_ip_list); +++ clp_info->remote_ip_list = NULL; +++ } +++ kfree(clp_info); +++} +++ +++void nfs_multipath_client_info_free(void *data) +++{ +++ struct multipath_client_info *clp_info = +++ (struct multipath_client_info *)data; +++ +++ if (clp_info == NULL) +++ return; +++ pr_info("free client info %p.\n", clp_info); +++ INIT_WORK(&clp_info->work, nfs_multipath_client_info_free_work); +++ schedule_work(&clp_info->work); +++} +++ +++int nfs_multipath_client_info_init(void **data, +++ const struct nfs_client_initdata *cl_init) +++{ +++ int rc; +++ struct multipath_client_info *info; +++ struct multipath_client_info **enfs_info; +++ /* no multi path info, no need do multipath init */ +++ if (cl_init->enfs_option == NULL) +++ return 0; +++ enfs_info = (struct multipath_client_info **)data; +++ if (enfs_info == NULL) +++ return -EINVAL; +++ +++ if (*enfs_info == NULL) +++ *enfs_info = kzalloc(sizeof(struct multipath_client_info), +++ GFP_KERNEL); +++ +++ if (*enfs_info == NULL) +++ return -ENOMEM; +++ +++ info = (struct multipath_client_info *)*enfs_info; +++ pr_info("init client info %p.\n", info); +++ rc = nfs_multipath_client_mount_info_init(info, cl_init); +++ if (rc) { +++ nfs_multipath_client_info_free((void *)info); +++ return rc; +++ } +++ return rc; +++} +++ +++bool nfs_multipath_ip_list_info_match(const struct nfs_ip_list *ip_list_src, +++ const struct nfs_ip_list *ip_list_dst) +++{ +++ int i; +++ int j; +++ bool is_find; +++ /* if both are equal or NULL, then return true. */ +++ if (ip_list_src == ip_list_dst) +++ return true; +++ +++ if ((ip_list_src == NULL || ip_list_dst == NULL)) +++ return false; +++ +++ if (ip_list_src->count != ip_list_dst->count) +++ return false; +++ +++ for (i = 0; i < ip_list_src->count; i++) { +++ is_find = false; +++ for (j = 0; j < ip_list_src->count; j++) { +++ if (rpc_cmp_addr_port( +++ (const struct sockaddr *) +++ &ip_list_src->address[i], +++ (const struct sockaddr *) +++ &ip_list_dst->address[j]) +++ ) { +++ is_find = true; +++ break; +++ } +++ } +++ if (is_find == false) +++ return false; +++ } +++ return true; +++} +++ +++int +++nfs_multipath_dns_list_info_match( +++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoSrc, +++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoDst) +++{ +++ int i; +++ +++ /* if both are equal or NULL, then return true. */ +++ if (pRemoteDnsInfoSrc == pRemoteDnsInfoDst) +++ return true; +++ +++ if ((pRemoteDnsInfoSrc == NULL || pRemoteDnsInfoDst == NULL)) +++ return false; +++ +++ if (pRemoteDnsInfoSrc->dnsNameCount != pRemoteDnsInfoDst->dnsNameCount) +++ return false; +++ +++ for (i = 0; i < pRemoteDnsInfoSrc->dnsNameCount; i++) { +++ if (!strcmp(pRemoteDnsInfoSrc->routeRemoteDnsList[i].dnsname, +++ pRemoteDnsInfoDst->routeRemoteDnsList[i].dnsname)) +++ return false; +++ } +++ return true; +++} +++ +++int nfs_multipath_client_info_match(void *src, void *dst) +++{ +++ int ret = true; +++ +++ struct multipath_client_info *src_info; +++ struct multipath_mount_options *dst_info; +++ +++ src_info = (struct multipath_client_info *)src; +++ dst_info = (struct multipath_mount_options *)dst; +++ pr_info("try match client .\n"); +++ ret = nfs_multipath_ip_list_info_match(src_info->local_ip_list, +++ dst_info->local_ip_list); +++ if (ret == false) { +++ pr_err("local_ip not match.\n"); +++ return ret; +++ } +++ +++ ret = nfs_multipath_ip_list_info_match(src_info->remote_ip_list, +++ dst_info->remote_ip_list); +++ if (ret == false) { +++ pr_err("remote_ip not match.\n"); +++ return ret; +++ } +++ +++ ret = nfs_multipath_dns_list_info_match(src_info->pRemoteDnsInfo, +++ dst_info->pRemoteDnsInfo); +++ if (ret == false) { +++ pr_err("dns not match.\n"); +++ return ret; +++ } +++ pr_info("try match client ret %d.\n", ret); +++ return ret; +++} +++ +++void nfs_multipath_print_ip_info(struct seq_file *mount_option, +++ struct nfs_ip_list *ip_list, +++ const char *type) +++{ +++ char buf[IP_ADDRESS_LEN_MAX + 1]; +++ int len = 0; +++ int i = 0; +++ +++ seq_printf(mount_option, ",%s=", type); +++ for (i = 0; i < ip_list->count; i++) { +++ len = rpc_ntop((struct sockaddr *)&ip_list->address[i], +++ buf, IP_ADDRESS_LEN_MAX); +++ if (len > 0 && len < IP_ADDRESS_LEN_MAX) +++ buf[len] = '\0'; +++ +++ if (i == 0) +++ seq_printf(mount_option, "%s", buf); +++ else +++ seq_printf(mount_option, "~%s", buf); +++ dfprintk(MOUNT, +++ "NFS: show nfs mount option type:%s %s [%s]\n", +++ type, buf, __func__); +++ } +++} +++ +++void nfs_multipath_print_dns_info(struct seq_file *mount_option, +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo, +++ const char *type) +++{ +++ int i = 0; +++ +++ seq_printf(mount_option, ",%s=", type); +++ for (i = 0; i < pRemoteDnsInfo->dnsNameCount; i++) { +++ if (i == 0) +++ seq_printf(mount_option, +++ "[%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ else if (i == pRemoteDnsInfo->dnsNameCount - 1) +++ seq_printf(mount_option, ",%s]", +++ pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ else +++ seq_printf(mount_option, +++ ",%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ } +++} +++ +++ +++static void multipath_print_sockaddr(struct seq_file *seq, +++ struct sockaddr *addr) +++{ +++ switch (addr->sa_family) { +++ case AF_INET: { +++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; +++ +++ seq_printf(seq, "%pI4", &sin->sin_addr); +++ return; +++ } +++ case AF_INET6: { +++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; +++ +++ seq_printf(seq, "%pI6", &sin6->sin6_addr); +++ return; +++ } +++ default: +++ break; +++ } +++ pr_err("unsupport family:%d\n", addr->sa_family); +++} +++ +++static void multipath_print_enfs_info(struct seq_file *seq, +++ struct nfs_server *server) +++{ +++ struct sockaddr_storage peeraddr; +++ struct rpc_clnt *next = server->client; +++ +++ rpc_peeraddr(server->client, +++ (struct sockaddr *)&peeraddr, sizeof(peeraddr)); +++ seq_puts(seq, ",enfs_info="); +++ multipath_print_sockaddr(seq, (struct sockaddr *)&peeraddr); +++ +++ while (next->cl_parent) { +++ if (next == next->cl_parent) +++ break; +++ next = next->cl_parent; +++ } +++ seq_printf(seq, "_%u", next->cl_clid); +++} +++ +++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data) +++{ +++ struct nfs_server *server = data; +++ struct multipath_client_info *client_info = +++ server->nfs_client->cl_multipath_data; +++ +++ dfprintk(MOUNT, "NFS: show nfs mount option[%s]\n", __func__); +++ if ((client_info->remote_ip_list) && +++ (client_info->remote_ip_list->count > 0)) +++ nfs_multipath_print_ip_info(mount_option, +++ client_info->remote_ip_list, +++ "remoteaddrs"); +++ +++ if ((client_info->local_ip_list) && +++ (client_info->local_ip_list->count > 0)) +++ nfs_multipath_print_ip_info(mount_option, +++ client_info->local_ip_list, +++ "localaddrs"); +++ +++ if ((client_info->pRemoteDnsInfo) && +++ (client_info->pRemoteDnsInfo->dnsNameCount > 0)) +++ nfs_multipath_print_dns_info(mount_option, +++ client_info->pRemoteDnsInfo, +++ "remotednsname"); +++ +++ multipath_print_enfs_info(mount_option, server); +++} ++diff --git a/fs/nfs/enfs/enfs_multipath_client.h b/fs/nfs/enfs/enfs_multipath_client.h ++new file mode 100644 ++index 000000000000..208f7260690d ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_client.h ++@@ -0,0 +1,26 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _ENFS_MULTIPATH_CLIENT_H_ +++#define _ENFS_MULTIPATH_CLIENT_H_ +++ +++#include "enfs.h" +++ +++struct multipath_client_info { +++ struct work_struct work; +++ struct nfs_ip_list *remote_ip_list; +++ struct nfs_ip_list *local_ip_list; +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; +++ s64 client_id; +++}; +++ +++int nfs_multipath_client_info_init(void **data, +++ const struct nfs_client_initdata *cl_init); +++void nfs_multipath_client_info_free(void *data); +++int nfs_multipath_client_info_match(void *src, void *dst); +++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data); +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_multipath_parse.c b/fs/nfs/enfs/enfs_multipath_parse.c ++new file mode 100644 ++index 000000000000..9c4c6c1880b6 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_parse.c ++@@ -0,0 +1,601 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/types.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include <linux/parser.h> +++#include <linux/kern_levels.h> +++#include <linux/sunrpc/addr.h> +++#include "enfs_multipath_parse.h" +++#include "enfs_log.h" +++ +++#define NFSDBG_FACILITY NFSDBG_CLIENT +++ +++void nfs_multipath_parse_ip_ipv6_add(struct sockaddr_in6 *sin6, int add_num) +++{ +++ int i; +++ +++ pr_info("NFS: before %08x%08x%08x%08x add_num: %d[%s]\n", +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]), +++ add_num, __func__); +++ for (i = 0; i < add_num; i++) { +++ sin6->sin6_addr.in6_u.u6_addr32[3] = +++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]) + 1); +++ +++ if (sin6->sin6_addr.in6_u.u6_addr32[3] != 0) +++ continue; +++ +++ sin6->sin6_addr.in6_u.u6_addr32[2] = +++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]) + 1); +++ +++ if (sin6->sin6_addr.in6_u.u6_addr32[2] != 0) +++ continue; +++ +++ sin6->sin6_addr.in6_u.u6_addr32[1] = +++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]) + 1); +++ +++ if (sin6->sin6_addr.in6_u.u6_addr32[1] != 0) +++ continue; +++ +++ sin6->sin6_addr.in6_u.u6_addr32[0] = +++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]) + 1); +++ +++ if (sin6->sin6_addr.in6_u.u6_addr32[0] != 0) +++ continue; +++ } +++ +++ return; +++ +++} +++ +++static int nfs_multipath_parse_ip_range(struct net *net_ns, const char *cursor, +++ struct nfs_ip_list *ip_list, enum nfsmultipathoptions type) +++{ +++ struct sockaddr_storage addr; +++ struct sockaddr_storage tmp_addr; +++ int i; +++ size_t len; +++ int add_num = 1; +++ bool duplicate_flag = false; +++ bool is_complete = false; +++ struct sockaddr_in *sin4; +++ struct sockaddr_in6 *sin6; +++ +++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", +++ cursor, type, __func__); +++ len = rpc_pton(net_ns, cursor, strlen(cursor), +++ (struct sockaddr *)&addr, sizeof(addr)); +++ if (!len) +++ return -EINVAL; +++ +++ if (addr.ss_family != ip_list->address[ip_list->count - 1].ss_family) { +++ pr_info("NFS: %s parsing nfs mount option type: %d fail.\n", +++ __func__, type); +++ return -EINVAL; +++ } +++ +++ if (rpc_cmp_addr((const struct sockaddr *) +++ &ip_list->address[ip_list->count - 1], +++ (const struct sockaddr *)&addr)) { +++ +++ pr_info("range ip is same ip.\n"); +++ return 0; +++ +++ } +++ +++ while (true) { +++ +++ tmp_addr = ip_list->address[ip_list->count - 1]; +++ +++ switch (addr.ss_family) { +++ case AF_INET: +++ sin4 = (struct sockaddr_in *)&tmp_addr; +++ +++ sin4->sin_addr.s_addr = +++ htonl(ntohl(sin4->sin_addr.s_addr) + add_num); +++ +++ pr_info("NFS: mount option ip%08x type: %d ipcont %d [%s]\n", +++ ntohl(sin4->sin_addr.s_addr), +++ type, ip_list->count, __func__); +++ break; +++ case AF_INET6: +++ sin6 = (struct sockaddr_in6 *)&tmp_addr; +++ nfs_multipath_parse_ip_ipv6_add(sin6, add_num); +++ pr_info("NFS: mount option ip %08x%08x%08x%08x type: %d ipcont %d [%s]\n", +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]), +++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]), +++ type, ip_list->count, __func__); +++ break; +++ // return -EOPNOTSUPP; +++ default: +++ return -EOPNOTSUPP; +++ } +++ +++ if (rpc_cmp_addr((const struct sockaddr *)&tmp_addr, +++ (const struct sockaddr *)&addr)) { +++ is_complete = true; +++ } +++ // delete duplicate ip, continuosly repeat, skip it +++ for (i = 0; i < ip_list->count; i++) { +++ duplicate_flag = false; +++ if (rpc_cmp_addr((const struct sockaddr *) +++ &ip_list->address[i], +++ (const struct sockaddr *)&tmp_addr)) { +++ add_num++; +++ duplicate_flag = true; +++ break; +++ } +++ } +++ +++ if (duplicate_flag == false) { +++ pr_info("this ip not duplicate;"); +++ add_num = 1; +++ // if not repeat but omit limit return false +++ if ((type == LOCALADDR && +++ ip_list->count >= MAX_SUPPORTED_LOCAL_IP_COUNT) || +++ (type == REMOTEADDR && +++ ip_list->count >= MAX_SUPPORTED_REMOTE_IP_COUNT)) { +++ +++ pr_info("[MULTIPATH:%s] iplist for type %d reached %d, more than supported limit %d\n", +++ __func__, type, ip_list->count, +++ type == LOCALADDR ? +++ MAX_SUPPORTED_LOCAL_IP_COUNT : +++ MAX_SUPPORTED_REMOTE_IP_COUNT); +++ ip_list->count = 0; +++ return -ENOSPC; +++ } +++ ip_list->address[ip_list->count] = tmp_addr; +++ +++ ip_list->addrlen[ip_list->count] = +++ ip_list->addrlen[ip_list->count - 1]; +++ +++ ip_list->count += 1; +++ } +++ if (is_complete == true) +++ break; +++ } +++ return 0; +++} +++ +++int nfs_multipath_parse_ip_list_inter(struct nfs_ip_list *ip_list, +++ struct net *net_ns, +++ char *cursor, enum nfsmultipathoptions type) +++{ +++ int i = 0; +++ struct sockaddr_storage addr; +++ struct sockaddr_storage swap; +++ int len; +++ +++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", +++ cursor, type, __func__); +++ +++ len = rpc_pton(net_ns, cursor, +++ strlen(cursor), +++ (struct sockaddr *)&addr, sizeof(addr)); +++ if (!len) +++ return -EINVAL; +++ +++ // check repeated ip +++ for (i = 0; i < ip_list->count; i++) { +++ if (rpc_cmp_addr((const struct sockaddr *) +++ &ip_list->address[i], +++ (const struct sockaddr *)&addr)) { +++ +++ pr_info("NFS: mount option '%s' type:%d index %d same as before index %d [%s]\n", +++ cursor, type, ip_list->count, i, __func__); +++ // prevent this ip is beginning +++ // if repeated take it to the end of list +++ swap = ip_list->address[i]; +++ +++ ip_list->address[i] = +++ ip_list->address[ip_list->count-1]; +++ +++ ip_list->address[ip_list->count-1] = swap; +++ return 0; +++ } +++ } +++ // if not repeated, check exceed limit +++ if ((type == LOCALADDR && +++ ip_list->count >= MAX_SUPPORTED_LOCAL_IP_COUNT) || +++ (type == REMOTEADDR && +++ ip_list->count >= MAX_SUPPORTED_REMOTE_IP_COUNT)) { +++ +++ pr_info("[MULTIPATH:%s] iplist for type %d reached %d, more than supported limit %d\n", +++ __func__, type, ip_list->count, +++ type == LOCALADDR ? +++ MAX_SUPPORTED_LOCAL_IP_COUNT : +++ MAX_SUPPORTED_REMOTE_IP_COUNT); +++ +++ ip_list->count = 0; +++ return -ENOSPC; +++ } +++ ip_list->address[ip_list->count] = addr; +++ ip_list->addrlen[ip_list->count] = len; +++ ip_list->count++; +++ +++ return 0; +++} +++ +++char *nfs_multipath_parse_ip_list_get_cursor(char **buf_to_parse, bool *single) +++{ +++ char *cursor = NULL; +++ const char *single_sep = strchr(*buf_to_parse, '~'); +++ const char *range_sep = strchr(*buf_to_parse, '-'); +++ +++ *single = true; +++ if (range_sep) { +++ if (range_sep > single_sep) { // A-B or A~B-C +++ if (single_sep == NULL) { // A-B +++ cursor = strsep(buf_to_parse, "-"); +++ if (cursor) +++ *single = false; +++ } else// A~B-C +++ cursor = strsep(buf_to_parse, "~"); +++ } else { // A-B~C +++ cursor = strsep(buf_to_parse, "-"); +++ if (cursor) +++ *single = false; +++ } +++ } else { // A~B~C +++ cursor = strsep(buf_to_parse, "~"); +++ } +++ return cursor; +++} +++ +++bool nfs_multipath_parse_param_check(enum nfsmultipathoptions type, +++ struct multipath_mount_options *options) +++{ +++ if (type == REMOUNTREMOTEADDR && options->remote_ip_list->count != 0) { +++ memset(options->remote_ip_list, 0, sizeof(struct nfs_ip_list)); +++ return true; +++ } +++ if (type == REMOUNTLOCALADDR && options->local_ip_list->count != 0) { +++ memset(options->local_ip_list, 0, sizeof(struct nfs_ip_list)); +++ return true; +++ } +++ if ((type == REMOTEADDR || type == REMOTEDNSNAME) && +++ options->pRemoteDnsInfo->dnsNameCount != 0) { +++ +++ pr_info("[MULTIPATH:%s] parse for %d ,already have dns\n", +++ __func__, type); +++ return false; +++ } else if ((type == REMOTEADDR || type == REMOTEDNSNAME) && +++ options->remote_ip_list->count != 0) { +++ +++ pr_info("[MULTIPATH:%s] parse for %d ,already have iplist\n", +++ __func__, type); +++ return false; +++ } +++ return true; +++} +++ +++int nfs_multipath_parse_ip_list(char *buffer, struct net *net_ns, +++ struct multipath_mount_options *options, +++ enum nfsmultipathoptions type) +++{ +++ char *buf_to_parse = NULL; +++ bool prev_range = false; +++ int ret = 0; +++ char *cursor = NULL; +++ bool single = true; +++ struct nfs_ip_list *ip_list_tmp = NULL; +++ +++ if (!nfs_multipath_parse_param_check(type, options)) +++ return -ENOTSUPP; +++ +++ if (type == REMOUNTREMOTEADDR) +++ type = REMOTEADDR; +++ +++ if (type == REMOUNTLOCALADDR) +++ type = LOCALADDR; +++ +++ if (type == LOCALADDR) +++ ip_list_tmp = options->local_ip_list; +++ else +++ ip_list_tmp = options->remote_ip_list; +++ +++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", +++ buffer, type, __func__); +++ +++ buf_to_parse = buffer; +++ while (buf_to_parse != NULL) { +++ cursor = +++ nfs_multipath_parse_ip_list_get_cursor(&buf_to_parse, &single); +++ if (!cursor) +++ break; +++ +++ if (single == false && prev_range == true) { +++ pr_info("NFS: mount option type: %d fail. Multiple Range.[%s]\n", +++ type, __func__); +++ +++ ret = -EINVAL; +++ goto out; +++ } +++ +++ if (prev_range == false) { +++ ret = nfs_multipath_parse_ip_list_inter(ip_list_tmp, +++ net_ns, cursor, type); +++ if (ret) +++ goto out; +++ if (single == false) +++ prev_range = true; +++ } else { +++ ret = nfs_multipath_parse_ip_range(net_ns, cursor, +++ ip_list_tmp, type); +++ if (ret != 0) +++ goto out; +++ prev_range = false; +++ } +++ } +++ +++out: +++ if (ret) +++ memset(ip_list_tmp, 0, sizeof(struct nfs_ip_list)); +++ +++ return ret; +++} +++ +++int nfs_multipath_parse_dns_list(char *buffer, struct net *net_ns, +++ struct multipath_mount_options *options) +++{ +++ struct NFS_ROUTE_DNS_INFO_S *dns_name_list_tmp = NULL; +++ char *cursor = NULL; +++ char *bufToParse; +++ +++ if (!nfs_multipath_parse_param_check(REMOTEDNSNAME, options)) +++ return -ENOTSUPP; +++ +++ pr_info("[MULTIPATH:%s] buffer %s\n", __func__, buffer); +++ // freed in nfs_free_parsed_mount_data +++ dns_name_list_tmp = kmalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), +++ GFP_KERNEL); +++ if (!dns_name_list_tmp) +++ return -ENOMEM; +++ +++ dns_name_list_tmp->dnsNameCount = 0; +++ bufToParse = buffer; +++ while (bufToParse) { +++ if (dns_name_list_tmp->dnsNameCount >= MAX_DNS_SUPPORTED) { +++ pr_err("%s: dnsname for %s reached %d,more than supported limit %d\n", +++ __func__, cursor, +++ dns_name_list_tmp->dnsNameCount, +++ MAX_DNS_SUPPORTED); +++ dns_name_list_tmp->dnsNameCount = 0; +++ return -ENOSPC; +++ } +++ cursor = strsep(&bufToParse, "~"); +++ if (!cursor) +++ break; +++ +++ strcpy(dns_name_list_tmp->routeRemoteDnsList +++ [dns_name_list_tmp->dnsNameCount].dnsname, +++ cursor); +++ dns_name_list_tmp->dnsNameCount++; +++ } +++ if (dns_name_list_tmp->dnsNameCount == 0) +++ return -EINVAL; +++ options->pRemoteDnsInfo = dns_name_list_tmp; +++ return 0; +++} +++ +++int nfs_multipath_parse_options_check_ipv4_valid(struct sockaddr_in *addr) +++{ +++ if (addr->sin_addr.s_addr == 0 || addr->sin_addr.s_addr == 0xffffffff) +++ return -EINVAL; +++ return 0; +++} +++ +++int nfs_multipath_parse_options_check_ipv6_valid(struct sockaddr_in6 *addr) +++{ +++ if (addr->sin6_addr.in6_u.u6_addr32[0] == 0 && +++ addr->sin6_addr.in6_u.u6_addr32[1] == 0 && +++ addr->sin6_addr.in6_u.u6_addr32[2] == 0 && +++ addr->sin6_addr.in6_u.u6_addr32[3] == 0) +++ return -EINVAL; +++ +++ if (addr->sin6_addr.in6_u.u6_addr32[0] == 0xffffffff && +++ addr->sin6_addr.in6_u.u6_addr32[1] == 0xffffffff && +++ addr->sin6_addr.in6_u.u6_addr32[2] == 0xffffffff && +++ addr->sin6_addr.in6_u.u6_addr32[3] == 0xffffffff) +++ return -EINVAL; +++ return 0; +++} +++ +++int nfs_multipath_parse_options_check_ip_valid(struct sockaddr_storage *address) +++{ +++ int rc = 0; +++ +++ if (address->ss_family == AF_INET) +++ rc = nfs_multipath_parse_options_check_ipv4_valid( +++ (struct sockaddr_in *)address); +++ else if (address->ss_family == AF_INET6) +++ rc = nfs_multipath_parse_options_check_ipv6_valid( +++ (struct sockaddr_in6 *)address); +++ else +++ rc = -EINVAL; +++ +++ return rc; +++} +++ +++int nfs_multipath_parse_options_check_valid( +++ struct multipath_mount_options *options) +++{ +++ int rc; +++ int i; +++ +++ if (options == NULL) +++ return 0; +++ +++ for (i = 0; i < options->local_ip_list->count; i++) { +++ rc = nfs_multipath_parse_options_check_ip_valid( +++ &options->local_ip_list->address[i]); +++ if (rc != 0) +++ return rc; +++ } +++ +++ for (i = 0; i < options->remote_ip_list->count; i++) { +++ rc = nfs_multipath_parse_options_check_ip_valid( +++ &options->remote_ip_list->address[i]); +++ if (rc != 0) +++ return rc; +++ } +++ +++ return 0; +++} +++int nfs_multipath_parse_options_check_duplicate( +++ struct multipath_mount_options *options) +++{ +++ int i; +++ int j; +++ +++ if (options == NULL || +++ options->local_ip_list->count == 0 || +++ options->remote_ip_list->count == 0) +++ +++ return 0; +++ +++ for (i = 0; i < options->local_ip_list->count; i++) { +++ for (j = 0; j < options->remote_ip_list->count; j++) { +++ if (rpc_cmp_addr((const struct sockaddr *) +++ &options->local_ip_list->address[i], +++ (const struct sockaddr *) +++ &options->remote_ip_list->address[j])) +++ return -ENOTSUPP; +++ } +++ } +++ return 0; +++} +++ +++int nfs_multipath_parse_options_check(struct multipath_mount_options *options) +++{ +++ int rc = 0; +++ +++ rc = nfs_multipath_parse_options_check_valid(options); +++ +++ if (rc != 0) { +++ pr_err("has invaild ip.\n"); +++ return rc; +++ } +++ +++ rc = nfs_multipath_parse_options_check_duplicate(options); +++ if (rc != 0) +++ return rc; +++ return rc; +++} +++ +++int nfs_multipath_alloc_options(void **enfs_option) +++{ +++ struct multipath_mount_options *options = NULL; +++ +++ options = kzalloc(sizeof(struct multipath_mount_options), GFP_KERNEL); +++ +++ if (options == NULL) +++ return -ENOMEM; +++ +++ options->local_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ if (options->local_ip_list == NULL) { +++ kfree(options); +++ return -ENOMEM; +++ } +++ +++ options->remote_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ if (options->remote_ip_list == NULL) { +++ kfree(options->local_ip_list); +++ kfree(options); +++ return -ENOMEM; +++ } +++ +++ options->pRemoteDnsInfo = kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), +++ GFP_KERNEL); +++ if (options->pRemoteDnsInfo == NULL) { +++ kfree(options->remote_ip_list); +++ kfree(options->local_ip_list); +++ kfree(options); +++ return -ENOMEM; +++ } +++ +++ *enfs_option = options; +++ return 0; +++} +++ +++int nfs_multipath_parse_options(enum nfsmultipathoptions type, +++ char *str, void **enfs_option, struct net *net_ns) +++{ +++ int rc; +++ struct multipath_mount_options *options = NULL; +++ +++ if ((str == NULL) || (enfs_option == NULL) || (net_ns == NULL)) +++ return -EINVAL; +++ +++ if (*enfs_option == NULL) { +++ rc = nfs_multipath_alloc_options(enfs_option); +++ if (rc != 0) { +++ enfs_log_error( +++ "alloc enfs_options failed! errno:%d\n", rc); +++ return rc; +++ } +++ } +++ +++ options = (struct multipath_mount_options *)*enfs_option; +++ +++ if (type == LOCALADDR || type == REMOUNTLOCALADDR || +++ type == REMOTEADDR || type == REMOUNTREMOTEADDR) { +++ rc = nfs_multipath_parse_ip_list(str, net_ns, options, type); +++ } else if (type == REMOTEDNSNAME) { +++ /* alloc and release need to modify */ +++ rc = nfs_multipath_parse_dns_list(str, net_ns, options); +++ } else { +++ rc = -EOPNOTSUPP; +++ } +++ +++ // after parsing cmd, need checking local and remote +++ // IP is same. if not means illegal cmd +++ if (rc == 0) +++ rc = nfs_multipath_parse_options_check_duplicate(options); +++ +++ if (rc == 0) +++ rc = nfs_multipath_parse_options_check(options); +++ +++ return rc; +++} +++ +++void nfs_multipath_free_options(void **enfs_option) +++{ +++ struct multipath_mount_options *options; +++ +++ if (enfs_option == NULL || *enfs_option == NULL) +++ return; +++ +++ options = (struct multipath_mount_options *)*enfs_option; +++ +++ if (options->remote_ip_list != NULL) { +++ kfree(options->remote_ip_list); +++ options->remote_ip_list = NULL; +++ } +++ +++ if (options->local_ip_list != NULL) { +++ kfree(options->local_ip_list); +++ options->local_ip_list = NULL; +++ } +++ +++ if (options->pRemoteDnsInfo != NULL) { +++ kfree(options->pRemoteDnsInfo); +++ options->pRemoteDnsInfo = NULL; +++ } +++ +++ kfree(options); +++ *enfs_option = NULL; +++} ++diff --git a/fs/nfs/enfs/enfs_multipath_parse.h b/fs/nfs/enfs/enfs_multipath_parse.h ++new file mode 100644 ++index 000000000000..6f3e8703e3e2 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_parse.h ++@@ -0,0 +1,22 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _ENFS_MULTIPATH_PARSE_H_ +++#define _ENFS_MULTIPATH_PARSE_H_ +++ +++#include "enfs.h" +++ +++struct multipath_mount_options { +++ struct nfs_ip_list *remote_ip_list; +++ struct nfs_ip_list *local_ip_list; +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; +++}; +++ +++int nfs_multipath_parse_options(enum nfsmultipathoptions type, +++ char *str, void **enfs_option, struct net *net_ns); +++void nfs_multipath_free_options(void **enfs_option); +++ +++#endif +diff --git a/0004-add_enfs_module_for_sunrpc_multipatch.patch b/0004-add_enfs_module_for_sunrpc_multipatch.patch +new file mode 100644 +index 0000000..2c0fcc7 +--- /dev/null ++++ b/0004-add_enfs_module_for_sunrpc_multipatch.patch +@@ -0,0 +1,1581 @@ ++diff --git a/fs/nfs/enfs/enfs_multipath.h b/fs/nfs/enfs/enfs_multipath.h ++new file mode 100644 ++index 000000000000..e064c2929ced ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath.h ++@@ -0,0 +1,24 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: enfs multipath +++ * Author: +++ * Create: 2023-07-31 +++ */ +++ +++#ifndef ENFS_MULTIPATH_H +++#define ENFS_MULTIPATH_H +++#include <linux/sunrpc/clnt.h> +++ +++#define MAX_XPRT_NUM_PER_CLIENT 32 +++ +++int enfs_multipath_init(void); +++void enfs_multipath_exit(void); +++void enfs_xprt_ippair_create(struct xprt_create *xprtargs, +++ struct rpc_clnt *clnt, void *data); +++int enfs_config_xprt_create_args(struct xprt_create *xprtargs, +++ struct rpc_create_args *args, +++ char *servername, size_t length); +++void print_enfs_multipath_addr(struct sockaddr *local, struct sockaddr *remote); +++ +++#endif // ENFS_MULTIPATH_H ++diff --git a/fs/nfs/enfs/enfs_multipath_client.c b/fs/nfs/enfs/enfs_multipath_client.c ++new file mode 100644 ++index 000000000000..63c02898a42c ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_client.c ++@@ -0,0 +1,340 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#include <linux/types.h> +++#include <linux/nfs.h> +++#include <linux/nfs4.h> +++#include <linux/nfs_fs.h> +++#include <linux/nfs_fs_sb.h> +++#include <linux/proc_fs.h> +++#include <linux/seq_file.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/addr.h> +++#include "enfs_multipath_client.h" +++#include "enfs_multipath_parse.h" +++ +++int +++nfs_multipath_client_mount_info_init(struct multipath_client_info *client_info, +++ const struct nfs_client_initdata *client_init_data) +++{ +++ struct multipath_mount_options *mount_options = +++ (struct multipath_mount_options *)client_init_data->enfs_option; +++ +++ if (mount_options->local_ip_list) { +++ client_info->local_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ +++ if (!client_info->local_ip_list) +++ return -ENOMEM; +++ +++ memcpy(client_info->local_ip_list, mount_options->local_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (mount_options->remote_ip_list) { +++ +++ client_info->remote_ip_list = +++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); +++ +++ if (!client_info->remote_ip_list) { +++ kfree(client_info->local_ip_list); +++ client_info->local_ip_list = NULL; +++ return -ENOMEM; +++ } +++ memcpy(client_info->remote_ip_list, +++ mount_options->remote_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (mount_options->pRemoteDnsInfo) { +++ client_info->pRemoteDnsInfo = +++ kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), GFP_KERNEL); +++ +++ if (!client_info->pRemoteDnsInfo) { +++ kfree(client_info->local_ip_list); +++ client_info->local_ip_list = NULL; +++ kfree(client_info->remote_ip_list); +++ client_info->remote_ip_list = NULL; +++ return -ENOMEM; +++ } +++ memcpy(client_info->pRemoteDnsInfo, +++ mount_options->pRemoteDnsInfo, +++ sizeof(struct NFS_ROUTE_DNS_INFO_S)); +++ } +++ return 0; +++} +++ +++void nfs_multipath_client_info_free_work(struct work_struct *work) +++{ +++ +++ struct multipath_client_info *clp_info; +++ +++ if (work == NULL) +++ return; +++ +++ clp_info = container_of(work, struct multipath_client_info, work); +++ +++ if (clp_info->local_ip_list != NULL) { +++ kfree(clp_info->local_ip_list); +++ clp_info->local_ip_list = NULL; +++ } +++ if (clp_info->remote_ip_list != NULL) { +++ kfree(clp_info->remote_ip_list); +++ clp_info->remote_ip_list = NULL; +++ } +++ kfree(clp_info); +++} +++ +++void nfs_multipath_client_info_free(void *data) +++{ +++ struct multipath_client_info *clp_info = +++ (struct multipath_client_info *)data; +++ +++ if (clp_info == NULL) +++ return; +++ pr_info("free client info %p.\n", clp_info); +++ INIT_WORK(&clp_info->work, nfs_multipath_client_info_free_work); +++ schedule_work(&clp_info->work); +++} +++ +++int nfs_multipath_client_info_init(void **data, +++ const struct nfs_client_initdata *cl_init) +++{ +++ int rc; +++ struct multipath_client_info *info; +++ struct multipath_client_info **enfs_info; +++ /* no multi path info, no need do multipath init */ +++ if (cl_init->enfs_option == NULL) +++ return 0; +++ enfs_info = (struct multipath_client_info **)data; +++ if (enfs_info == NULL) +++ return -EINVAL; +++ +++ if (*enfs_info == NULL) +++ *enfs_info = kzalloc(sizeof(struct multipath_client_info), +++ GFP_KERNEL); +++ +++ if (*enfs_info == NULL) +++ return -ENOMEM; +++ +++ info = (struct multipath_client_info *)*enfs_info; +++ pr_info("init client info %p.\n", info); +++ rc = nfs_multipath_client_mount_info_init(info, cl_init); +++ if (rc) { +++ nfs_multipath_client_info_free((void *)info); +++ return rc; +++ } +++ return rc; +++} +++ +++bool nfs_multipath_ip_list_info_match(const struct nfs_ip_list *ip_list_src, +++ const struct nfs_ip_list *ip_list_dst) +++{ +++ int i; +++ int j; +++ bool is_find; +++ /* if both are equal or NULL, then return true. */ +++ if (ip_list_src == ip_list_dst) +++ return true; +++ +++ if ((ip_list_src == NULL || ip_list_dst == NULL)) +++ return false; +++ +++ if (ip_list_src->count != ip_list_dst->count) +++ return false; +++ +++ for (i = 0; i < ip_list_src->count; i++) { +++ is_find = false; +++ for (j = 0; j < ip_list_src->count; j++) { +++ if (rpc_cmp_addr_port( +++ (const struct sockaddr *) +++ &ip_list_src->address[i], +++ (const struct sockaddr *) +++ &ip_list_dst->address[j]) +++ ) { +++ is_find = true; +++ break; +++ } +++ } +++ if (is_find == false) +++ return false; +++ } +++ return true; +++} +++ +++int +++nfs_multipath_dns_list_info_match( +++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoSrc, +++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoDst) +++{ +++ int i; +++ +++ /* if both are equal or NULL, then return true. */ +++ if (pRemoteDnsInfoSrc == pRemoteDnsInfoDst) +++ return true; +++ +++ if ((pRemoteDnsInfoSrc == NULL || pRemoteDnsInfoDst == NULL)) +++ return false; +++ +++ if (pRemoteDnsInfoSrc->dnsNameCount != pRemoteDnsInfoDst->dnsNameCount) +++ return false; +++ +++ for (i = 0; i < pRemoteDnsInfoSrc->dnsNameCount; i++) { +++ if (!strcmp(pRemoteDnsInfoSrc->routeRemoteDnsList[i].dnsname, +++ pRemoteDnsInfoDst->routeRemoteDnsList[i].dnsname)) +++ return false; +++ } +++ return true; +++} +++ +++int nfs_multipath_client_info_match(void *src, void *dst) +++{ +++ int ret = true; +++ +++ struct multipath_client_info *src_info; +++ struct multipath_mount_options *dst_info; +++ +++ src_info = (struct multipath_client_info *)src; +++ dst_info = (struct multipath_mount_options *)dst; +++ pr_info("try match client .\n"); +++ ret = nfs_multipath_ip_list_info_match(src_info->local_ip_list, +++ dst_info->local_ip_list); +++ if (ret == false) { +++ pr_err("local_ip not match.\n"); +++ return ret; +++ } +++ +++ ret = nfs_multipath_ip_list_info_match(src_info->remote_ip_list, +++ dst_info->remote_ip_list); +++ if (ret == false) { +++ pr_err("remote_ip not match.\n"); +++ return ret; +++ } +++ +++ ret = nfs_multipath_dns_list_info_match(src_info->pRemoteDnsInfo, +++ dst_info->pRemoteDnsInfo); +++ if (ret == false) { +++ pr_err("dns not match.\n"); +++ return ret; +++ } +++ pr_info("try match client ret %d.\n", ret); +++ return ret; +++} +++ +++void nfs_multipath_print_ip_info(struct seq_file *mount_option, +++ struct nfs_ip_list *ip_list, +++ const char *type) +++{ +++ char buf[IP_ADDRESS_LEN_MAX + 1]; +++ int len = 0; +++ int i = 0; +++ +++ seq_printf(mount_option, ",%s=", type); +++ for (i = 0; i < ip_list->count; i++) { +++ len = rpc_ntop((struct sockaddr *)&ip_list->address[i], +++ buf, IP_ADDRESS_LEN_MAX); +++ if (len > 0 && len < IP_ADDRESS_LEN_MAX) +++ buf[len] = '\0'; +++ +++ if (i == 0) +++ seq_printf(mount_option, "%s", buf); +++ else +++ seq_printf(mount_option, "~%s", buf); +++ dfprintk(MOUNT, +++ "NFS: show nfs mount option type:%s %s [%s]\n", +++ type, buf, __func__); +++ } +++} +++ +++void nfs_multipath_print_dns_info(struct seq_file *mount_option, +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo, +++ const char *type) +++{ +++ int i = 0; +++ +++ seq_printf(mount_option, ",%s=", type); +++ for (i = 0; i < pRemoteDnsInfo->dnsNameCount; i++) { +++ if (i == 0) +++ seq_printf(mount_option, +++ "[%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ else if (i == pRemoteDnsInfo->dnsNameCount - 1) +++ seq_printf(mount_option, ",%s]", +++ pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ else +++ seq_printf(mount_option, +++ ",%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); +++ } +++} +++ +++ +++static void multipath_print_sockaddr(struct seq_file *seq, +++ struct sockaddr *addr) +++{ +++ switch (addr->sa_family) { +++ case AF_INET: { +++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; +++ +++ seq_printf(seq, "%pI4", &sin->sin_addr); +++ return; +++ } +++ case AF_INET6: { +++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; +++ +++ seq_printf(seq, "%pI6", &sin6->sin6_addr); +++ return; +++ } +++ default: +++ break; +++ } +++ pr_err("unsupport family:%d\n", addr->sa_family); +++} +++ +++static void multipath_print_enfs_info(struct seq_file *seq, +++ struct nfs_server *server) +++{ +++ struct sockaddr_storage peeraddr; +++ struct rpc_clnt *next = server->client; +++ +++ rpc_peeraddr(server->client, +++ (struct sockaddr *)&peeraddr, sizeof(peeraddr)); +++ seq_puts(seq, ",enfs_info="); +++ multipath_print_sockaddr(seq, (struct sockaddr *)&peeraddr); +++ +++ while (next->cl_parent) { +++ if (next == next->cl_parent) +++ break; +++ next = next->cl_parent; +++ } +++ seq_printf(seq, "_%u", next->cl_clid); +++} +++ +++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data) +++{ +++ struct nfs_server *server = data; +++ struct multipath_client_info *client_info = +++ server->nfs_client->cl_multipath_data; +++ +++ dfprintk(MOUNT, "NFS: show nfs mount option[%s]\n", __func__); +++ if ((client_info->remote_ip_list) && +++ (client_info->remote_ip_list->count > 0)) +++ nfs_multipath_print_ip_info(mount_option, +++ client_info->remote_ip_list, +++ "remoteaddrs"); +++ +++ if ((client_info->local_ip_list) && +++ (client_info->local_ip_list->count > 0)) +++ nfs_multipath_print_ip_info(mount_option, +++ client_info->local_ip_list, +++ "localaddrs"); +++ +++ if ((client_info->pRemoteDnsInfo) && +++ (client_info->pRemoteDnsInfo->dnsNameCount > 0)) +++ nfs_multipath_print_dns_info(mount_option, +++ client_info->pRemoteDnsInfo, +++ "remotednsname"); +++ +++ multipath_print_enfs_info(mount_option, server); +++} ++diff --git a/fs/nfs/enfs/enfs_multipath_client.h b/fs/nfs/enfs/enfs_multipath_client.h ++new file mode 100644 ++index 000000000000..208f7260690d ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_multipath_client.h ++@@ -0,0 +1,26 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS adapter. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef _ENFS_MULTIPATH_CLIENT_H_ +++#define _ENFS_MULTIPATH_CLIENT_H_ +++ +++#include "enfs.h" +++ +++struct multipath_client_info { +++ struct work_struct work; +++ struct nfs_ip_list *remote_ip_list; +++ struct nfs_ip_list *local_ip_list; +++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; +++ s64 client_id; +++}; +++ +++int nfs_multipath_client_info_init(void **data, +++ const struct nfs_client_initdata *cl_init); +++void nfs_multipath_client_info_free(void *data); +++int nfs_multipath_client_info_match(void *src, void *dst); +++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data); +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_path.c b/fs/nfs/enfs/enfs_path.c ++new file mode 100644 ++index 000000000000..7355f8c2f672 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_path.c ++@@ -0,0 +1,47 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++ +++#include <linux/sunrpc/metrics.h> +++#include <linux/sunrpc/xprt.h> +++ +++#include "enfs.h" +++#include "enfs_log.h" +++#include "enfs_path.h" +++ +++// only create ctx in this function +++// alloc iostat memory in create_clnt +++int enfs_alloc_xprt_ctx(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx; +++ +++ if (!xprt) { +++ enfs_log_error("invalid xprt pointer.\n"); +++ return -EINVAL; +++ } +++ +++ ctx = kzalloc(sizeof(struct enfs_xprt_context), GFP_KERNEL); +++ if (!ctx) { +++ enfs_log_error("add xprt test failed.\n"); +++ return -ENOMEM; +++ } +++ +++ xprt->multipath_context = (void *)ctx; +++ return 0; +++} +++ +++// free multi_context and iostat memory +++void enfs_free_xprt_ctx(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx = xprt->multipath_context; +++ +++ if (ctx) { +++ if (ctx->stats) { +++ rpc_free_iostats(ctx->stats); +++ ctx->stats = NULL; +++ } +++ kfree(xprt->multipath_context); +++ xprt->multipath_context = NULL; +++ } +++} ++diff --git a/fs/nfs/enfs/enfs_path.h b/fs/nfs/enfs/enfs_path.h ++new file mode 100644 ++index 000000000000..97b1ef3730b8 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_path.h ++@@ -0,0 +1,12 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++ +++#ifndef ENFS_PATH_H +++#define ENFS_PATH_H +++ +++int enfs_alloc_xprt_ctx(struct rpc_xprt *xprt); +++void enfs_free_xprt_ctx(struct rpc_xprt *xprt); +++ +++#endif // ENFS_PATH_H ++diff --git a/fs/nfs/enfs/enfs_proc.c b/fs/nfs/enfs/enfs_proc.c ++new file mode 100644 ++index 000000000000..53fa1a07642f ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_proc.c ++@@ -0,0 +1,545 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++#include <linux/module.h> +++#include <linux/proc_fs.h> +++#include <linux/seq_file.h> +++#include <linux/spinlock.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/metrics.h> +++#include <linux/sunrpc/xprtsock.h> +++#include <net/netns/generic.h> +++ +++#include "../../../net/sunrpc/netns.h" +++ +++#include "enfs.h" +++#include "enfs_log.h" +++#include "enfs_proc.h" +++#include "enfs_multipath.h" +++#include "pm_state.h" +++ +++#define ENFS_PROC_DIR "enfs" +++#define ENFS_PROC_PATH_STATUS_LEN 256 +++ +++static struct proc_dir_entry *enfs_proc_parent; +++ +++void +++enfs_iterate_each_rpc_clnt(int (*fn)(struct rpc_clnt *clnt, void *data), +++ void *data) +++{ +++ struct net *net; +++ struct sunrpc_net *sn; +++ struct rpc_clnt *clnt; +++ +++ rcu_read_lock(); +++ for_each_net_rcu(net) { +++ sn = net_generic(net, sunrpc_net_id); +++ if (sn == NULL) +++ continue; +++ spin_lock(&sn->rpc_client_lock); +++ list_for_each_entry(clnt, &sn->all_clients, cl_clients) { +++ fn(clnt, data); +++ } +++ spin_unlock(&sn->rpc_client_lock); +++ } +++ rcu_read_unlock(); +++} +++ +++struct proc_dir_entry *enfs_get_proc_parent(void) +++{ +++ return enfs_proc_parent; +++} +++ +++static int sockaddr_ip_to_str(struct sockaddr *addr, char *buf, int len) +++{ +++ switch (addr->sa_family) { +++ case AF_INET: { +++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; +++ +++ snprintf(buf, len, "%pI4", &sin->sin_addr); +++ return 0; +++ } +++ case AF_INET6: { +++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; +++ +++ snprintf(buf, len, "%pI6", &sin6->sin6_addr); +++ return 0; +++ } +++ default: +++ break; +++ } +++ return 1; +++} +++ +++static bool should_print(const char *name) +++{ +++ int i; +++ static const char * const proc_names[] = { +++ "READ", +++ "WRITE", +++ }; +++ +++ if (name == NULL) +++ return false; +++ +++ for (i = 0; i < ARRAY_SIZE(proc_names); i++) { +++ if (strcmp(name, proc_names[i]) == 0) +++ return true; +++ } +++ return false; +++} +++ +++struct enfs_xprt_iter { +++ unsigned int id; +++ struct seq_file *seq; +++ unsigned int max_addrs_length; +++}; +++ +++static int debug_show_xprt(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ +++ if (xprt->multipath_context) +++ ctx = xprt->multipath_context; +++ +++ pr_info(" xprt:%p ctx:%p main:%d queue_len:%lu.\n", xprt, +++ xprt->multipath_context, +++ ctx ? ctx->main : false, +++ atomic_long_read(&xprt->queuelen)); +++ return 0; +++} +++ +++static int debug_show_clnt(struct rpc_clnt *clnt, void *data) +++{ +++ pr_info(" clnt %d addr:%p enfs:%d\n", +++ clnt->cl_clid, clnt, +++ clnt->cl_enfs); +++ rpc_clnt_iterate_for_each_xprt(clnt, debug_show_xprt, NULL); +++ return 0; +++} +++ +++static void debug_print_all_xprt(void) +++{ +++ enfs_iterate_each_rpc_clnt(debug_show_clnt, NULL); +++} +++ +++static +++void enfs_proc_format_xprt_addr_display(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ char *local_name_buf, +++ int local_name_buf_len, +++ char *remote_name_buf, +++ int remote_name_buf_len) +++{ +++ int err; +++ struct sockaddr_storage srcaddr; +++ struct enfs_xprt_context *ctx; +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ +++ sockaddr_ip_to_str((struct sockaddr *)&xprt->addr, +++ remote_name_buf, remote_name_buf_len); +++ +++ // get local address depend one main or not +++ if (enfs_is_main_xprt(xprt)) { +++ err = rpc_localaddr(clnt, (struct sockaddr *)&srcaddr, +++ sizeof(srcaddr)); +++ if (err != 0) +++ (void)snprintf(local_name_buf, +++ local_name_buf_len, "Unknown"); +++ else +++ sockaddr_ip_to_str((struct sockaddr *)&srcaddr, +++ local_name_buf, +++ local_name_buf_len); +++ } else { +++ sockaddr_ip_to_str((struct sockaddr *)&ctx->srcaddr, +++ local_name_buf, +++ local_name_buf_len); +++ } +++} +++ +++static int enfs_show_xprt_stats(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ unsigned int op; +++ unsigned int maxproc = clnt->cl_maxproc; +++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; +++ struct enfs_xprt_context *ctx; +++ char local_name[INET6_ADDRSTRLEN]; +++ char remote_name[INET6_ADDRSTRLEN]; +++ +++ if (!xprt->multipath_context) +++ return 0; +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ +++ enfs_proc_format_xprt_addr_display(clnt, xprt, local_name, +++ sizeof(local_name), +++ remote_name, sizeof(remote_name)); +++ +++ seq_printf(iter->seq, "%-6u%-*s%-*s", iter->id, +++ iter->max_addrs_length + 4, +++ local_name, +++ iter->max_addrs_length + 4, +++ remote_name); +++ +++ iter->id++; +++ +++ for (op = 0; op < maxproc; op++) { +++ if (!should_print(clnt->cl_procinfo[op].p_name)) +++ continue; +++ +++ seq_printf(iter->seq, "%-22lu%-22Lu%-22Lu", +++ ctx->stats[op].om_ops, +++ ctx->stats[op].om_ops == 0 ? 0 : +++ ktime_to_ms(ctx->stats[op].om_rtt) / +++ ctx->stats[op].om_ops, +++ ctx->stats[op].om_ops == 0 ? 0 : +++ ktime_to_ms(ctx->stats[op].om_execute) / +++ ctx->stats[op].om_ops); +++ } +++ seq_puts(iter->seq, "\n"); +++ return 0; +++} +++ +++static int rpc_proc_show_path_status(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; +++ struct enfs_xprt_context *ctx = NULL; +++ char local_name[INET6_ADDRSTRLEN] = {0}; +++ char remote_name[INET6_ADDRSTRLEN] = {0}; +++ char multiapth_status[ENFS_PROC_PATH_STATUS_LEN] = {0}; +++ char xprt_status[ENFS_PROC_PATH_STATUS_LEN] = {0}; +++ +++ if (!xprt->multipath_context) { +++ enfs_log_debug("multipath_context is null.\n"); +++ return 0; +++ } +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ +++ enfs_proc_format_xprt_addr_display(clnt, xprt, +++ local_name, +++ sizeof(local_name), +++ remote_name, sizeof(remote_name)); +++ +++ pm_get_path_state_desc(xprt, +++ multiapth_status, +++ ENFS_PROC_PATH_STATUS_LEN); +++ +++ pm_get_xprt_state_desc(xprt, +++ xprt_status, +++ ENFS_PROC_PATH_STATUS_LEN); +++ +++ seq_printf(iter->seq, "%-6u%-*s%-*s%-12s%-12s\n", +++ iter->id, iter->max_addrs_length + 4, +++ local_name, iter->max_addrs_length + 4, +++ remote_name, multiapth_status, +++ xprt_status); +++ iter->id++; +++ return 0; +++} +++ +++static int enfs_get_max_addrs_length(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; +++ char local_name[INET6_ADDRSTRLEN]; +++ char remote_name[INET6_ADDRSTRLEN]; +++ +++ enfs_proc_format_xprt_addr_display(clnt, xprt, +++ local_name, sizeof(local_name), +++ remote_name, sizeof(remote_name)); +++ +++ if (iter->max_addrs_length < strlen(local_name)) +++ iter->max_addrs_length = strlen(local_name); +++ +++ if (iter->max_addrs_length < strlen(remote_name)) +++ iter->max_addrs_length = strlen(remote_name); +++ +++ return 0; +++} +++ +++static int rpc_proc_clnt_showpath(struct seq_file *seq, void *v) +++{ +++ struct rpc_clnt *clnt = seq->private; +++ struct enfs_xprt_iter iter; +++ +++ iter.seq = seq; +++ iter.id = 0; +++ iter.max_addrs_length = 0; +++ +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ enfs_get_max_addrs_length, +++ (void *)&iter); +++ +++ seq_printf(seq, "%-6s%-*s%-*s%-12s%-12s\n", "id", +++ iter.max_addrs_length + 4, +++ "local_addr", +++ iter.max_addrs_length + 4, +++ "remote_addr", +++ "path_state", +++ "xprt_state"); +++ +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ rpc_proc_show_path_status, +++ (void *)&iter); +++ return 0; +++} +++ +++static int enfs_rpc_proc_show(struct seq_file *seq, void *v) +++{ +++ struct rpc_clnt *clnt = seq->private; +++ struct enfs_xprt_iter iter; +++ +++ iter.seq = seq; +++ iter.id = 0; +++ iter.max_addrs_length = 0; +++ +++ debug_print_all_xprt(); +++ pr_info("enfs proc clnt:%p\n", clnt); +++ +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ enfs_get_max_addrs_length, +++ (void *)&iter); +++ +++ seq_printf(seq, "%-6s%-*s%-*s%-22s%-22s%-22s%-22s%-22s%-22s\n", "id", +++ iter.max_addrs_length + 4, "local_addr", +++ iter.max_addrs_length + 4, +++ "remote_addr", "r_count", +++ "r_rtt", "r_exec", "w_count", "w_rtt", "w_exec"); +++ +++ // rpc_clnt_show_stats(seq, clnt); +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ enfs_show_xprt_stats, +++ (void *)&iter); +++ return 0; +++} +++ +++static int rpc_proc_open(struct inode *inode, struct file *file) +++{ +++ struct rpc_clnt *clnt = PDE_DATA(inode); +++ +++ pr_info("%s %p\n", __func__, clnt); +++ return single_open(file, enfs_rpc_proc_show, clnt); +++} +++ +++static int enfs_reset_xprt_stats(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void *data) +++{ +++ unsigned int op; +++ struct enfs_xprt_context *ctx; +++ unsigned int maxproc = clnt->cl_maxproc; +++ struct rpc_iostats stats = {0}; +++ +++ if (!xprt->multipath_context) +++ return 0; +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ +++ for (op = 0; op < maxproc; op++) { +++ spin_lock(&ctx->stats[op].om_lock); +++ ctx->stats[op] = stats; +++ spin_unlock(&ctx->stats[op].om_lock); +++ } +++ return 0; +++} +++ +++static void trim_newline_ch(char *str, int len) +++{ +++ int i; +++ +++ for (i = 0; str[i] != '\0' && i < len; i++) { +++ if (str[i] == '\n') +++ str[i] = '\0'; +++ } +++} +++ +++static ssize_t enfs_proc_write(struct file *file, +++ const char __user *user_buf, +++ size_t len, +++ loff_t *offset) +++{ +++ char buffer[128]; +++ struct rpc_clnt *clnt = +++ ((struct seq_file *)file->private_data)->private; +++ +++ if (len >= sizeof(buffer)) +++ return -E2BIG; +++ +++ if (copy_from_user(buffer, user_buf, len) != 0) +++ return -EFAULT; +++ +++ buffer[len] = '\0'; +++ trim_newline_ch(buffer, len); +++ if (strcmp(buffer, "reset") != 0) +++ return -EINVAL; +++ +++ rpc_clnt_iterate_for_each_xprt(clnt, enfs_reset_xprt_stats, NULL); +++ return len; +++} +++ +++static int rpc_proc_show_path(struct inode *inode, struct file *file) +++{ +++ struct rpc_clnt *clnt = PDE_DATA(inode); +++ +++ return single_open(file, rpc_proc_clnt_showpath, clnt); +++} +++ +++static const struct file_operations rpc_proc_fops = { +++ .owner = THIS_MODULE, +++ .open = rpc_proc_open, +++ .read = seq_read, +++ .llseek = seq_lseek, +++ .release = single_release, +++ .write = enfs_proc_write, +++}; +++ +++static const struct file_operations rpc_show_path_fops = { +++ .owner = THIS_MODULE, +++ .open = rpc_proc_show_path, +++ .read = seq_read, +++ .llseek = seq_lseek, +++ .release = single_release, +++}; +++ +++static int clnt_proc_name(struct rpc_clnt *clnt, char *buf, int len) +++{ +++ int ret; +++ +++ ret = snprintf(buf, len, "%s_%u", +++ rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR), +++ clnt->cl_clid); +++ if (ret > len) +++ return -E2BIG; +++ return 0; +++} +++ +++static int enfs_proc_create_file(struct rpc_clnt *clnt) +++{ +++ int err; +++ char buf[128]; +++ +++ struct proc_dir_entry *clnt_entry; +++ struct proc_dir_entry *stat_entry; +++ +++ err = clnt_proc_name(clnt, buf, sizeof(buf)); +++ if (err) +++ return err; +++ +++ clnt_entry = proc_mkdir(buf, enfs_proc_parent); +++ if (clnt_entry == NULL) +++ return -EINVAL; +++ +++ stat_entry = proc_create_data("stat", +++ 0, clnt_entry, +++ &rpc_proc_fops, clnt); +++ +++ if (stat_entry == NULL) +++ return -EINVAL; +++ +++ stat_entry = proc_create_data("path", +++ 0, clnt_entry, +++ &rpc_show_path_fops, clnt); +++ +++ if (stat_entry == NULL) +++ return -EINVAL; +++ +++ return 0; +++} +++ +++void enfs_count_iostat(struct rpc_task *task) +++{ +++ struct enfs_xprt_context *ctx = task->tk_xprt->multipath_context; +++ +++ if (!ctx || !ctx->stats) +++ return; +++ rpc_count_iostats(task, ctx->stats); +++} +++ +++static void enfs_proc_delete_file(struct rpc_clnt *clnt) +++{ +++ int err; +++ char buf[128]; +++ +++ err = clnt_proc_name(clnt, buf, sizeof(buf)); +++ if (err) { +++ pr_err("gen clnt name failed.\n"); +++ return; +++ } +++ remove_proc_subtree(buf, enfs_proc_parent); +++} +++ +++// create proc file "/porc/enfs/[mount_ip]_[id]/stat" +++int enfs_proc_create_clnt(struct rpc_clnt *clnt) +++{ +++ int err; +++ +++ err = enfs_proc_create_file(clnt); +++ if (err) { +++ pr_err("create client %d\n", err); +++ return err; +++ } +++ +++ return 0; +++} +++ +++void enfs_proc_delete_clnt(struct rpc_clnt *clnt) +++{ +++ if (clnt->cl_enfs) +++ enfs_proc_delete_file(clnt); +++} +++ +++static int enfs_proc_create_parent(void) +++{ +++ enfs_proc_parent = proc_mkdir(ENFS_PROC_DIR, NULL); +++ +++ if (enfs_proc_parent == NULL) { +++ pr_err("Enfs create proc dir err\n"); +++ return -ENOMEM; +++ } +++ return 0; +++} +++ +++static void enfs_proc_delete_parent(void) +++{ +++ remove_proc_entry(ENFS_PROC_DIR, NULL); +++} +++ +++static int enfs_proc_init_create_clnt(struct rpc_clnt *clnt, void *data) +++{ +++ if (clnt->cl_enfs) +++ enfs_proc_create_file(clnt); +++ return 0; +++} +++ +++static int enfs_proc_destroy_clnt(struct rpc_clnt *clnt, void *data) +++{ +++ if (clnt->cl_enfs) +++ enfs_proc_delete_file(clnt); +++ return 0; +++} +++ +++int enfs_proc_init(void) +++{ +++ int err; +++ +++ err = enfs_proc_create_parent(); +++ if (err) +++ return err; +++ +++ enfs_iterate_each_rpc_clnt(enfs_proc_init_create_clnt, NULL); +++ return 0; +++} +++ +++void enfs_proc_exit(void) +++{ +++ enfs_iterate_each_rpc_clnt(enfs_proc_destroy_clnt, NULL); +++ enfs_proc_delete_parent(); +++} ++diff --git a/fs/nfs/enfs/enfs_proc.h b/fs/nfs/enfs/enfs_proc.h ++new file mode 100644 ++index 000000000000..321951031c2e ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_proc.h ++@@ -0,0 +1,21 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Client-side ENFS PROC. +++ * +++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. +++ */ +++#ifndef ENFS_PROC_H +++#define ENFS_PROC_H +++ +++struct rpc_clnt; +++struct rpc_task; +++struct proc_dir_entry; +++ +++int enfs_proc_init(void); +++void enfs_proc_exit(void); +++struct proc_dir_entry *enfs_get_proc_parent(void); +++int enfs_proc_create_clnt(struct rpc_clnt *clnt); +++void enfs_proc_delete_clnt(struct rpc_clnt *clnt); +++void enfs_count_iostat(struct rpc_task *task); +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_remount.c b/fs/nfs/enfs/enfs_remount.c ++new file mode 100644 ++index 000000000000..2c3fe125c735 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_remount.c ++@@ -0,0 +1,221 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: remount ip source file +++ * Author: y00583252 +++ * Create: 2023-08-12 +++ */ +++#include "enfs_remount.h" +++ +++#include <linux/string.h> +++#include <linux/in.h> +++#include <linux/in6.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/spinlock.h> +++#include <linux/sunrpc/addr.h> +++#include <linux/sunrpc/metrics.h> +++#include <linux/sunrpc/xprtmultipath.h> +++#include <linux/sunrpc/xprtsock.h> +++#include <linux/sunrpc/xprt.h> +++#include <linux/smp.h> +++#include <linux/delay.h> +++ +++#include "enfs.h" +++#include "enfs_log.h" +++#include "enfs_multipath.h" +++#include "enfs_multipath_parse.h" +++#include "enfs_path.h" +++#include "enfs_proc.h" +++#include "enfs_multipath_client.h" +++ +++static bool enfs_rpc_xprt_switch_need_delete_addr( +++ struct multipath_mount_options *enfs_option, +++ struct sockaddr *dstaddr, struct sockaddr *srcaddr) +++{ +++ int i; +++ bool find_same_ip = false; +++ int32_t local_total; +++ int32_t remote_total; +++ +++ local_total = enfs_option->local_ip_list->count; +++ remote_total = enfs_option->remote_ip_list->count; +++ if (local_total == 0 || remote_total == 0) { +++ pr_err("no ip list is present.\n"); +++ return false; +++ } +++ +++ for (i = 0; i < local_total; i++) { +++ find_same_ip = +++ rpc_cmp_addr((struct sockaddr *) +++ &enfs_option->local_ip_list->address[i], +++ srcaddr); +++ if (find_same_ip) +++ break; +++ } +++ +++ if (find_same_ip == false) +++ return true; +++ +++ find_same_ip = false; +++ for (i = 0; i < remote_total; i++) { +++ find_same_ip = +++ rpc_cmp_addr((struct sockaddr *) +++ &enfs_option->remote_ip_list->address[i], +++ dstaddr); +++ if (find_same_ip) +++ break; +++ } +++ +++ if (find_same_ip == false) +++ return true; +++ +++ return false; +++} +++ +++// Used in rcu_lock +++static bool enfs_delete_xprt_from_switch(struct rpc_xprt *xprt, +++ void *enfs_option, +++ struct rpc_xprt_switch *xps) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ struct multipath_mount_options *mopt = +++ (struct multipath_mount_options *)enfs_option; +++ +++ if (enfs_is_main_xprt(xprt)) +++ return true; +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (enfs_rpc_xprt_switch_need_delete_addr(mopt, +++ (struct sockaddr *)&xprt->addr, +++ (struct sockaddr *)&ctx->srcaddr)) { +++ +++ print_enfs_multipath_addr((struct sockaddr *)&ctx->srcaddr, +++ (struct sockaddr *)&xprt->addr); +++ rpc_xprt_switch_remove_xprt(xps, xprt); +++ return true; +++ } +++ +++ return false; +++} +++ +++void enfs_clnt_delete_obsolete_xprts(struct nfs_client *nfs_client, +++ void *enfs_option) +++{ +++ int xprt_count = 0; +++ struct rpc_xprt *pos = NULL; +++ struct rpc_xprt_switch *xps = NULL; +++ +++ rcu_read_lock(); +++ xps = xprt_switch_get( +++ rcu_dereference( +++ nfs_client->cl_rpcclient->cl_xpi.xpi_xpswitch)); +++ if (xps == NULL) { +++ rcu_read_unlock(); +++ xprt_switch_put(xps); +++ return; +++ } +++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { +++ if (xprt_count < MAX_XPRT_NUM_PER_CLIENT) { +++ if (enfs_delete_xprt_from_switch( +++ pos, enfs_option, xps) == false) +++ xprt_count++; +++ } else +++ rpc_xprt_switch_remove_xprt(xps, pos); +++ } +++ rcu_read_unlock(); +++ xprt_switch_put(xps); +++} +++ +++int enfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option) +++{ +++ int errno = 0; +++ char servername[48]; +++ struct multipath_mount_options *remount_lists = +++ (struct multipath_mount_options *)enfs_option; +++ struct multipath_client_info *client_info = +++ (struct multipath_client_info *)nfs_client->cl_multipath_data; +++ struct xprt_create xprtargs; +++ struct rpc_create_args args = { +++ .protocol = nfs_client->cl_proto, +++ .net = nfs_client->cl_net, +++ .addrsize = nfs_client->cl_addrlen, +++ .servername = nfs_client->cl_hostname, +++ }; +++ +++ memset(&xprtargs, 0, sizeof(struct xprt_create)); +++ +++ //mount is not use multipath +++ if (client_info == NULL || enfs_option == NULL) { +++ enfs_log_error( +++ "mount information or remount information is empty.\n"); +++ return -EINVAL; +++ } +++ +++ //remount : localaddrs and remoteaddrs are empty +++ if (remount_lists->local_ip_list->count == 0 && +++ remount_lists->remote_ip_list->count == 0) { +++ enfs_log_info("remount local_ip_list and remote_ip_list are NULL\n"); +++ return 0; +++ } +++ +++ errno = enfs_config_xprt_create_args(&xprtargs, +++ &args, servername, sizeof(servername)); +++ +++ if (errno) { +++ enfs_log_error("config_xprt_create failed! errno:%d\n", errno); +++ return errno; +++ } +++ +++ if (remount_lists->local_ip_list->count == 0) { +++ if (client_info->local_ip_list->count == 0) { +++ errno = rpc_localaddr(nfs_client->cl_rpcclient, +++ (struct sockaddr *) +++ &remount_lists->local_ip_list->address[0], +++ sizeof(struct sockaddr_storage)); +++ if (errno) { +++ enfs_log_error("get clnt srcaddr errno:%d\n", +++ errno); +++ return errno; +++ } +++ remount_lists->local_ip_list->count = 1; +++ } else +++ memcpy(remount_lists->local_ip_list, +++ client_info->local_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ if (remount_lists->remote_ip_list->count == 0) { +++ if (client_info->remote_ip_list->count == 0) { +++ errno = rpc_peeraddr(nfs_client->cl_rpcclient, +++ (struct sockaddr *) +++ &remount_lists->remote_ip_list->address[0], +++ sizeof(struct sockaddr_storage)); +++ if (errno == 0) { +++ enfs_log_error("get clnt dstaddr errno:%d\n", +++ errno); +++ return errno; +++ } +++ remount_lists->remote_ip_list->count = 1; +++ } else +++ memcpy(remount_lists->remote_ip_list, +++ client_info->remote_ip_list, +++ sizeof(struct nfs_ip_list)); +++ } +++ +++ enfs_log_info("Remount creating new links...\n"); +++ enfs_xprt_ippair_create(&xprtargs, +++ nfs_client->cl_rpcclient, +++ remount_lists); +++ +++ enfs_log_info("Remount deleting obsolete links...\n"); +++ enfs_clnt_delete_obsolete_xprts(nfs_client, remount_lists); +++ +++ memcpy(client_info->local_ip_list, +++ remount_lists->local_ip_list, +++ sizeof(struct nfs_ip_list)); +++ memcpy(client_info->remote_ip_list, +++ remount_lists->remote_ip_list, +++ sizeof(struct nfs_ip_list)); +++ +++ return 0; +++} ++diff --git a/fs/nfs/enfs/enfs_remount.h b/fs/nfs/enfs/enfs_remount.h ++new file mode 100644 ++index 000000000000..a663ed257004 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_remount.h ++@@ -0,0 +1,15 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: remount ip header file +++ * Author: y00583252 +++ * Create: 2023-08-12 +++ */ +++#ifndef _ENFS_REMOUNT_ +++#define _ENFS_REMOUNT_ +++#include <linux/string.h> +++#include "enfs.h" +++ +++int enfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option); +++ +++#endif ++diff --git a/fs/nfs/enfs/enfs_roundrobin.c b/fs/nfs/enfs/enfs_roundrobin.c ++new file mode 100644 ++index 000000000000..4e4eda784a3e ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_roundrobin.c ++@@ -0,0 +1,255 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++#include <linux/spinlock.h> +++#include <linux/module.h> +++#include <linux/printk.h> +++#include <linux/kref.h> +++#include <linux/rculist.h> +++#include <linux/types.h> +++#include <linux/sunrpc/xprt.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/xprtmultipath.h> +++#include "enfs_roundrobin.h" +++ +++#include "enfs.h" +++#include "enfs_config.h" +++#include "pm_state.h" +++ +++typedef struct rpc_xprt *(*enfs_xprt_switch_find_xprt_t)( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur); +++static const struct rpc_xprt_iter_ops enfs_xprt_iter_roundrobin; +++static const struct rpc_xprt_iter_ops enfs_xprt_iter_singular; +++ +++static bool enfs_xprt_is_active(struct rpc_xprt *xprt) +++{ +++ enum pm_path_state state; +++ +++ if (kref_read(&xprt->kref) <= 0) +++ return false; +++ +++ state = pm_get_path_state(xprt); +++ if (state == PM_STATE_NORMAL) +++ return true; +++ +++ return false; +++} +++ +++static struct rpc_xprt *enfs_lb_set_cursor_xprt( +++ struct rpc_xprt_switch *xps, struct rpc_xprt **cursor, +++ enfs_xprt_switch_find_xprt_t find_next) +++{ +++ struct rpc_xprt *pos; +++ struct rpc_xprt *old; +++ +++ old = smp_load_acquire(cursor); /* read latest cursor */ +++ pos = find_next(xps, old); +++ smp_store_release(cursor, pos); /* let cursor point to pos */ +++ return pos; +++} +++ +++static +++struct rpc_xprt *enfs_lb_find_next_entry_roundrobin( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *pos; +++ struct rpc_xprt *prev = NULL; +++ bool found = false; +++ struct rpc_xprt *min_queuelen_xprt = NULL; +++ unsigned long pos_xprt_queuelen; +++ unsigned long min_xprt_queuelen = 0; +++ +++ unsigned long xps_queuelen = atomic_long_read(&xps->xps_queuelen); +++ // delete origin xprt +++ unsigned int multipath_nactive = READ_ONCE(xps->xps_nactive) - 1; +++ +++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { +++ if (enfs_is_main_xprt(pos) || !enfs_xprt_is_active(pos)) { +++ prev = pos; +++ continue; +++ } +++ +++ pos_xprt_queuelen = atomic_long_read(&pos->queuelen); +++ if (min_queuelen_xprt == NULL || +++ pos_xprt_queuelen < min_xprt_queuelen) { +++ +++ min_queuelen_xprt = pos; +++ min_xprt_queuelen = pos_xprt_queuelen; +++ } +++ +++ if (cur == prev) +++ found = true; +++ +++ if (found && pos_xprt_queuelen * +++ multipath_nactive <= xps_queuelen) +++ return pos; +++ prev = pos; +++ }; +++ +++ return min_queuelen_xprt; +++} +++ +++struct rpc_xprt *enfs_lb_switch_find_first_active_xprt( +++ struct rpc_xprt_switch *xps) +++{ +++ struct rpc_xprt *pos; +++ +++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { +++ if (enfs_xprt_is_active(pos)) +++ return pos; +++ }; +++ return NULL; +++} +++ +++struct rpc_xprt *enfs_lb_switch_get_main_xprt(struct rpc_xprt_switch *xps) +++{ +++ return list_first_or_null_rcu(&xps->xps_xprt_list, +++ struct rpc_xprt, xprt_switch); +++} +++ +++static struct rpc_xprt *enfs_lb_switch_get_next_xprt_roundrobin( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *xprt; +++ +++ // disable multipath +++ if (enfs_get_config_multipath_state()) +++ return enfs_lb_switch_get_main_xprt(xps); +++ +++ xprt = enfs_lb_find_next_entry_roundrobin(xps, cur); +++ if (xprt != NULL) +++ return xprt; +++ +++ return enfs_lb_switch_get_main_xprt(xps); +++} +++ +++static +++struct rpc_xprt *enfs_lb_iter_next_entry_roundrobin(struct rpc_xprt_iter *xpi) +++{ +++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); +++ +++ if (xps == NULL) +++ return NULL; +++ +++ return enfs_lb_set_cursor_xprt(xps, &xpi->xpi_cursor, +++ enfs_lb_switch_get_next_xprt_roundrobin); +++} +++ +++static +++struct rpc_xprt *enfs_lb_switch_find_singular_entry( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *pos; +++ bool found = false; +++ +++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { +++ if (cur == pos) +++ found = true; +++ +++ if (found && enfs_xprt_is_active(pos)) +++ return pos; +++ } +++ return NULL; +++} +++ +++struct rpc_xprt *enfs_lb_get_singular_xprt( +++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *xprt; +++ +++ if (xps == NULL) +++ return NULL; +++ +++ // disable multipath +++ if (enfs_get_config_multipath_state()) +++ return enfs_lb_switch_get_main_xprt(xps); +++ +++ if (cur == NULL || xps->xps_nxprts < 2) +++ return enfs_lb_switch_find_first_active_xprt(xps); +++ +++ xprt = enfs_lb_switch_find_singular_entry(xps, cur); +++ if (!xprt) +++ return enfs_lb_switch_get_main_xprt(xps); +++ +++ return xprt; +++} +++ +++static +++struct rpc_xprt *enfs_lb_iter_next_entry_sigular(struct rpc_xprt_iter *xpi) +++{ +++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); +++ +++ if (xps == NULL) +++ return NULL; +++ +++ return enfs_lb_set_cursor_xprt(xps, &xpi->xpi_cursor, +++ enfs_lb_get_singular_xprt); +++} +++ +++static void enfs_lb_iter_default_rewind(struct rpc_xprt_iter *xpi) +++{ +++ WRITE_ONCE(xpi->xpi_cursor, NULL); +++} +++ +++static void enfs_lb_switch_set_roundrobin(struct rpc_clnt *clnt) +++{ +++ struct rpc_xprt_switch *xps; +++ +++ rcu_read_lock(); +++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); +++ rcu_read_unlock(); +++ if (clnt->cl_vers == 3) { +++ +++ if (READ_ONCE(xps->xps_iter_ops) != &enfs_xprt_iter_roundrobin) +++ WRITE_ONCE(xps->xps_iter_ops, +++ &enfs_xprt_iter_roundrobin); +++ +++ return; +++ } +++ if (READ_ONCE(xps->xps_iter_ops) != &enfs_xprt_iter_singular) +++ WRITE_ONCE(xps->xps_iter_ops, &enfs_xprt_iter_singular); +++} +++ +++static +++struct rpc_xprt *enfs_lb_switch_find_current(struct list_head *head, +++ const struct rpc_xprt *cur) +++{ +++ struct rpc_xprt *pos; +++ +++ list_for_each_entry_rcu(pos, head, xprt_switch) { +++ if (cur == pos) +++ return pos; +++ } +++ return NULL; +++} +++ +++static struct rpc_xprt *enfs_lb_iter_current_entry(struct rpc_xprt_iter *xpi) +++{ +++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); +++ struct list_head *head; +++ +++ if (xps == NULL) +++ return NULL; +++ head = &xps->xps_xprt_list; +++ if (xpi->xpi_cursor == NULL || xps->xps_nxprts < 2) +++ return enfs_lb_switch_get_main_xprt(xps); +++ return enfs_lb_switch_find_current(head, xpi->xpi_cursor); +++} +++ +++void enfs_lb_set_policy(struct rpc_clnt *clnt) +++{ +++ enfs_lb_switch_set_roundrobin(clnt); +++} +++ +++static const struct rpc_xprt_iter_ops enfs_xprt_iter_roundrobin = { +++ .xpi_rewind = enfs_lb_iter_default_rewind, +++ .xpi_xprt = enfs_lb_iter_current_entry, +++ .xpi_next = enfs_lb_iter_next_entry_roundrobin, +++}; +++ +++static const struct rpc_xprt_iter_ops enfs_xprt_iter_singular = { +++ .xpi_rewind = enfs_lb_iter_default_rewind, +++ .xpi_xprt = enfs_lb_iter_current_entry, +++ .xpi_next = enfs_lb_iter_next_entry_sigular, +++}; ++diff --git a/fs/nfs/enfs/enfs_roundrobin.h b/fs/nfs/enfs/enfs_roundrobin.h ++new file mode 100644 ++index 000000000000..b72b088a6258 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_roundrobin.h ++@@ -0,0 +1,9 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++#ifndef ENFS_ROUNDROBIN_H +++#define ENFS_ROUNDROBIN_H +++ +++void enfs_lb_set_policy(struct rpc_clnt *clnt); +++#endif +diff --git a/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch b/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch +new file mode 100644 +index 0000000..cc6b677 +--- /dev/null ++++ b/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch +@@ -0,0 +1,1607 @@ ++diff --git a/fs/nfs/enfs/enfs_config.c b/fs/nfs/enfs/enfs_config.c ++new file mode 100644 ++index 000000000000..11aa7a00385b ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_config.c ++@@ -0,0 +1,378 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ */ +++#include <linux/cdev.h> +++#include <linux/errno.h> +++#include <linux/fcntl.h> +++#include <linux/fs.h> +++#include <linux/kernel.h> +++#include <linux/kthread.h> +++#include <linux/slab.h> +++#include <linux/string.h> +++#include <linux/uaccess.h> +++#include <linux/delay.h> +++ +++#include "enfs_errcode.h" +++#include "enfs_log.h" +++#include "enfs_config.h" +++ +++#define MAX_FILE_SIZE 8192 +++#define STRING_BUF_SIZE 128 +++#define CONFIG_FILE_PATH "/etc/enfs/config.ini" +++#define ENFS_NOTIFY_FILE_PERIOD 1000UL +++ +++#define MAX_PATH_DETECT_INTERVAL 300 +++#define MIN_PATH_DETECT_INTERVAL 5 +++#define MAX_PATH_DETECT_TIMEOUT 60 +++#define MIN_PATH_DETECT_TIMEOUT 1 +++#define MAX_MULTIPATH_TIMEOUT 60 +++#define MIN_MULTIPATH_TIMEOUT 0 +++#define MAX_MULTIPATH_STATE ENFS_MULTIPATH_DISABLE +++#define MIN_MULTIPATH_STATE ENFS_MULTIPATH_ENABLE +++ +++#define DEFAULT_PATH_DETECT_INTERVAL 10 +++#define DEFAULT_PATH_DETECT_TIMEOUT 5 +++#define DEFAULT_MULTIPATH_TIMEOUT 0 +++#define DEFAULT_MULTIPATH_STATE ENFS_MULTIPATH_ENABLE +++#define DEFAULT_LOADBALANCE_MODE ENFS_LOADBALANCE_RR +++ +++typedef int (*check_and_assign_func)(char *, char *, int, int); +++ +++struct enfs_config_info { +++ int32_t path_detect_interval; +++ int32_t path_detect_timeout; +++ int32_t multipath_timeout; +++ int32_t loadbalance_mode; +++ int32_t multipath_state; +++}; +++ +++struct check_and_assign_value { +++ char *field_name; +++ check_and_assign_func func; +++ int min_value; +++ int max_value; +++}; +++ +++static struct enfs_config_info g_enfs_config_info; +++static struct timespec64 modify_time; +++static struct task_struct *thread; +++ +++static int enfs_check_config_value(char *value, int min_value, int max_value) +++{ +++ unsigned long num_value; +++ int ret; +++ +++ ret = kstrtol(value, 10, &num_value); +++ if (ret != 0) { +++ enfs_log_error("Failed to convert string to int\n"); +++ return -EINVAL; +++ } +++ +++ if (num_value < min_value || num_value > max_value) +++ return -EINVAL; +++ +++ return num_value; +++} +++ +++static int32_t enfs_check_and_assign_int_value(char *field_name, char *value, +++ int min_value, int max_value) +++{ +++ int int_value = enfs_check_config_value(value, min_value, max_value); +++ +++ if (int_value < 0) +++ return -EINVAL; +++ +++ if (strcmp(field_name, "path_detect_interval") == 0) { +++ g_enfs_config_info.path_detect_interval = int_value; +++ return ENFS_RET_OK; +++ } +++ if (strcmp(field_name, "path_detect_timeout") == 0) { +++ g_enfs_config_info.path_detect_timeout = int_value; +++ return ENFS_RET_OK; +++ } +++ if (strcmp(field_name, "multipath_timeout") == 0) { +++ g_enfs_config_info.multipath_timeout = int_value; +++ return ENFS_RET_OK; +++ } +++ if (strcmp(field_name, "multipath_disable") == 0) { +++ g_enfs_config_info.multipath_state = int_value; +++ return ENFS_RET_OK; +++ } +++ return -EINVAL; +++} +++ +++static int32_t enfs_check_and_assign_loadbalance_mode(char *field_name, +++ char *value, +++ int min_value, +++ int max_value) +++{ +++ if (value == NULL) +++ return -EINVAL; +++ +++ if (strcmp(field_name, "multipath_select_policy") == 0) { +++ if (strcmp(value, "roundrobin") == 0) { +++ g_enfs_config_info.loadbalance_mode +++ = ENFS_LOADBALANCE_RR; +++ return ENFS_RET_OK; +++ } +++ } +++ return -EINVAL; +++} +++ +++static const struct check_and_assign_value g_check_and_assign_value[] = { +++ {"path_detect_interval", enfs_check_and_assign_int_value, +++ MIN_PATH_DETECT_INTERVAL, MAX_PATH_DETECT_INTERVAL}, +++ {"path_detect_timeout", enfs_check_and_assign_int_value, +++ MIN_PATH_DETECT_TIMEOUT, MAX_PATH_DETECT_TIMEOUT}, +++ {"multipath_timeout", enfs_check_and_assign_int_value, +++ MIN_MULTIPATH_TIMEOUT, MAX_MULTIPATH_TIMEOUT}, +++ {"multipath_disable", enfs_check_and_assign_int_value, +++ MIN_MULTIPATH_STATE, MAX_MULTIPATH_STATE}, +++ {"multipath_select_policy", enfs_check_and_assign_loadbalance_mode, +++ 0, 0}, +++}; +++ +++static int32_t enfs_read_config_file(char *buffer, char *file_path) +++{ +++ int ret; +++ struct file *filp = NULL; +++ loff_t f_pos = 0; +++ mm_segment_t fs; +++ +++ +++ filp = filp_open(file_path, O_RDONLY, 0); +++ +++ if (IS_ERR(filp)) { +++ enfs_log_error("Failed to open file %s\n", CONFIG_FILE_PATH); +++ ret = -ENOENT; +++ return ret; +++ } +++ +++ fs = get_fs(); +++ set_fs(get_ds()); +++ kernel_read(filp, buffer, MAX_FILE_SIZE, &f_pos); +++ set_fs(fs); +++ +++ ret = filp_close(filp, NULL); +++ if (ret) { +++ enfs_log_error("Close File:%s failed:%d.\n", +++ CONFIG_FILE_PATH, ret); +++ return -EINVAL; +++ } +++ return ENFS_RET_OK; +++} +++ +++static int32_t enfs_deal_with_comment_line(char *buffer) +++{ +++ int ret; +++ char *pos = strchr(buffer, '\n'); +++ +++ if (pos != NULL) +++ ret = strlen(buffer) - strlen(pos); +++ else +++ ret = strlen(buffer); +++ +++ return ret; +++} +++ +++static int32_t enfs_parse_key_value_from_config(char *buffer, char *key, +++ char *value, int keyLen, +++ int valueLen) +++{ +++ char *line; +++ char *tokenPtr; +++ int len; +++ char *tem; +++ char *pos = strchr(buffer, '\n'); +++ +++ if (pos != NULL) +++ len = strlen(buffer) - strlen(pos); +++ else +++ len = strlen(buffer); +++ +++ line = kmalloc(len + 1, GFP_KERNEL); +++ if (!line) { +++ enfs_log_error("Failed to allocate memory.\n"); +++ return -ENOMEM; +++ } +++ line[len] = '\0'; +++ strncpy(line, buffer, len); +++ +++ tem = line; +++ tokenPtr = strsep(&tem, "="); +++ if (tokenPtr == NULL || tem == NULL) { +++ kfree(line); +++ return len; +++ } +++ strncpy(key, strim(tokenPtr), keyLen); +++ strncpy(value, strim(tem), valueLen); +++ +++ kfree(line); +++ return len; +++} +++ +++static int32_t enfs_get_value_from_config_file(char *buffer, char *field_name, +++ char *value, int valueLen) +++{ +++ int ret; +++ char key[STRING_BUF_SIZE + 1] = {0}; +++ char val[STRING_BUF_SIZE + 1] = {0}; +++ +++ while (buffer[0] != '\0') { +++ if (buffer[0] == '\n') { +++ buffer++; +++ } else if (buffer[0] == '#') { +++ ret = enfs_deal_with_comment_line(buffer); +++ if (ret > 0) +++ buffer += ret; +++ } else { +++ ret = enfs_parse_key_value_from_config(buffer, key, val, +++ STRING_BUF_SIZE, +++ STRING_BUF_SIZE); +++ if (ret < 0) { +++ enfs_log_error("failed parse key value, %d\n" +++ , ret); +++ return ret; +++ } +++ key[STRING_BUF_SIZE] = '\0'; +++ val[STRING_BUF_SIZE] = '\0'; +++ +++ buffer += ret; +++ +++ if (strcmp(field_name, key) == 0) { +++ strncpy(value, val, valueLen); +++ return ENFS_RET_OK; +++ } +++ } +++ } +++ enfs_log_error("can not find value which matched field_name: %s.\n", +++ field_name); +++ return -EINVAL; +++} +++ +++int32_t enfs_config_load(void) +++{ +++ char value[STRING_BUF_SIZE + 1]; +++ int ret; +++ int table_len; +++ int min; +++ int max; +++ int i; +++ char *buffer; +++ +++ buffer = kmalloc(MAX_FILE_SIZE, GFP_KERNEL); +++ if (!buffer) { +++ enfs_log_error("Failed to allocate memory.\n"); +++ return -ENOMEM; +++ } +++ memset(buffer, 0, MAX_FILE_SIZE); +++ +++ g_enfs_config_info.path_detect_interval = DEFAULT_PATH_DETECT_INTERVAL; +++ g_enfs_config_info.path_detect_timeout = DEFAULT_PATH_DETECT_TIMEOUT; +++ g_enfs_config_info.multipath_timeout = DEFAULT_MULTIPATH_TIMEOUT; +++ g_enfs_config_info.multipath_state = DEFAULT_MULTIPATH_STATE; +++ g_enfs_config_info.loadbalance_mode = DEFAULT_LOADBALANCE_MODE; +++ +++ table_len = sizeof(g_check_and_assign_value) / +++ sizeof(g_check_and_assign_value[0]); +++ +++ ret = enfs_read_config_file(buffer, CONFIG_FILE_PATH); +++ if (ret != 0) { +++ kfree(buffer); +++ return ret; +++ } +++ +++ for (i = 0; i < table_len; i++) { +++ ret = enfs_get_value_from_config_file(buffer, +++ g_check_and_assign_value[i].field_name, +++ value, STRING_BUF_SIZE); +++ if (ret < 0) +++ continue; +++ +++ value[STRING_BUF_SIZE] = '\0'; +++ min = g_check_and_assign_value[i].min_value; +++ max = g_check_and_assign_value[i].max_value; +++ if (g_check_and_assign_value[i].func != NULL) +++ (*g_check_and_assign_value[i].func)( +++ g_check_and_assign_value[i].field_name, +++ value, min, max); +++ } +++ +++ kfree(buffer); +++ return ENFS_RET_OK; +++} +++ +++int32_t enfs_get_config_path_detect_interval(void) +++{ +++ return g_enfs_config_info.path_detect_interval; +++} +++ +++int32_t enfs_get_config_path_detect_timeout(void) +++{ +++ return g_enfs_config_info.path_detect_timeout; +++} +++ +++int32_t enfs_get_config_multipath_timeout(void) +++{ +++ return g_enfs_config_info.multipath_timeout; +++} +++ +++int32_t enfs_get_config_multipath_state(void) +++{ +++ return g_enfs_config_info.multipath_state; +++} +++ +++int32_t enfs_get_config_loadbalance_mode(void) +++{ +++ return g_enfs_config_info.loadbalance_mode; +++} +++ +++static bool enfs_file_changed(const char *filename) +++{ +++ int err; +++ struct kstat file_stat; +++ +++ err = vfs_stat(filename, &file_stat); +++ if (err) { +++ pr_err("failed to open file:%s err:%d\n", filename, err); +++ return false; +++ } +++ +++ if (timespec64_compare(&modify_time, &file_stat.mtime) == -1) { +++ modify_time = file_stat.mtime; +++ pr_info("file change: %lld %lld\n", modify_time.tv_sec, +++ file_stat.mtime.tv_sec); +++ return true; +++ } +++ +++ return false; +++} +++ +++static int enfs_thread_func(void *data) +++{ +++ while (!kthread_should_stop()) { +++ if (enfs_file_changed(CONFIG_FILE_PATH)) +++ enfs_config_load(); +++ +++ msleep(ENFS_NOTIFY_FILE_PERIOD); +++ } +++ return 0; +++} +++ +++int enfs_config_timer_init(void) +++{ +++ thread = kthread_run(enfs_thread_func, NULL, "enfs_notiy_file_thread"); +++ if (IS_ERR(thread)) { +++ pr_err("Failed to create kernel thread\n"); +++ return PTR_ERR(thread); +++ } +++ return 0; +++} +++ +++void enfs_config_timer_exit(void) +++{ +++ pr_info("enfs_notify_file_exit\n"); +++ if (thread) +++ kthread_stop(thread); +++} ++diff --git a/fs/nfs/enfs/enfs_config.h b/fs/nfs/enfs/enfs_config.h ++new file mode 100644 ++index 000000000000..752710129170 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_config.h ++@@ -0,0 +1,32 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs configuration +++ * Author: y00583252 +++ * Create: 2023-07-27 +++ */ +++ +++#ifndef ENFS_CONFIG_H +++#define ENFS_CONFIG_H +++ +++#include <linux/types.h> +++ +++enum enfs_multipath_state { +++ ENFS_MULTIPATH_ENABLE = 0, +++ ENFS_MULTIPATH_DISABLE = 1, +++}; +++ +++enum enfs_loadbalance_mode { +++ ENFS_LOADBALANCE_RR, +++}; +++ +++ +++int32_t enfs_get_config_path_detect_interval(void); +++int32_t enfs_get_config_path_detect_timeout(void); +++int32_t enfs_get_config_multipath_timeout(void); +++int32_t enfs_get_config_multipath_state(void); +++int32_t enfs_get_config_loadbalance_mode(void); +++int32_t enfs_config_load(void); +++int32_t enfs_config_timer_init(void); +++void enfs_config_timer_exit(void); +++#endif // ENFS_CONFIG_H ++diff --git a/fs/nfs/enfs/enfs_errcode.h b/fs/nfs/enfs/enfs_errcode.h ++new file mode 100644 ++index 000000000000..cca47ab9a191 ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_errcode.h ++@@ -0,0 +1,17 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs errocode +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++ +++#ifndef ENFS_ERRCODE_H +++#define ENFS_ERRCODE_H +++ +++enum { +++ ENFS_RET_OK = 0, +++ ENFS_RET_FAIL +++}; +++ +++#endif // ENFS_ERRCODE_H ++diff --git a/fs/nfs/enfs/enfs_log.h b/fs/nfs/enfs/enfs_log.h ++new file mode 100644 ++index 000000000000..177b404f05df ++--- /dev/null +++++ b/fs/nfs/enfs/enfs_log.h ++@@ -0,0 +1,25 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: enfs log +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++#ifndef ENFS_LOG_H +++#define ENFS_LOG_H +++ +++#include <linux/printk.h> +++ +++#define enfs_log_info(fmt, ...) \ +++ pr_info("enfs:[%s]" pr_fmt(fmt), \ +++ __func__, ##__VA_ARGS__) +++ +++#define enfs_log_error(fmt, ...) \ +++ pr_err("enfs:[%s]" pr_fmt(fmt), \ +++ __func__, ##__VA_ARGS__) +++ +++#define enfs_log_debug(fmt, ...) \ +++ pr_debug("enfs:[%s]" pr_fmt(fmt), \ +++ __func__, ##__VA_ARGS__) +++ +++#endif // ENFS_ERRCODE_H ++diff --git a/fs/nfs/enfs/failover_com.h b/fs/nfs/enfs/failover_com.h ++new file mode 100644 ++index 000000000000..c52940da232e ++--- /dev/null +++++ b/fs/nfs/enfs/failover_com.h ++@@ -0,0 +1,23 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: failover time commont header file +++ * Create: 2023-08-02 +++ */ +++#ifndef FAILOVER_COMMON_H +++#define FAILOVER_COMMON_H +++ +++static inline bool failover_is_enfs_clnt(struct rpc_clnt *clnt) +++{ +++ struct rpc_clnt *next = clnt->cl_parent; +++ +++ while (next) { +++ if (next == next->cl_parent) +++ break; +++ next = next->cl_parent; +++ } +++ +++ return next != NULL ? next->cl_enfs : clnt->cl_enfs; +++} +++ +++#endif // FAILOVER_COMMON_H ++diff --git a/fs/nfs/enfs/failover_path.c b/fs/nfs/enfs/failover_path.c ++new file mode 100644 ++index 000000000000..93b454de29d1 ++--- /dev/null +++++ b/fs/nfs/enfs/failover_path.c ++@@ -0,0 +1,207 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs path failover file +++ * Author: y00583252 +++ * Create: 2023-08-02 +++ */ +++ +++#include "failover_path.h" +++#include <linux/nfs.h> +++#include <linux/nfs3.h> +++#include <linux/nfs4.h> +++#include <linux/sunrpc/clnt.h> +++#include <linux/sunrpc/sched.h> +++#include <linux/sunrpc/xprt.h> +++#include "enfs_config.h" +++#include "enfs_log.h" +++#include "failover_com.h" +++#include "pm_state.h" +++#include "pm_ping.h" +++ +++enum failover_policy_t { +++ FAILOVER_NOACTION = 1, +++ FAILOVER_RETRY, +++ FAILOVER_RETRY_DELAY, +++}; +++ +++static void failover_retry_path(struct rpc_task *task) +++{ +++ xprt_release(task); +++ rpc_init_task_retry_counters(task); +++ rpc_task_release_transport(task); +++ rpc_restart_call(task); +++} +++ +++static void failover_retry_path_delay(struct rpc_task *task, int32_t delay) +++{ +++ failover_retry_path(task); +++ rpc_delay(task, delay); +++} +++ +++static void failover_retry_path_by_policy(struct rpc_task *task, +++ enum failover_policy_t policy) +++{ +++ if (policy == FAILOVER_RETRY) +++ failover_retry_path(task); +++ else if (policy == FAILOVER_RETRY_DELAY) +++ failover_retry_path_delay(task, 3 * HZ); // delay 3s +++} +++ +++static +++enum failover_policy_t failover_get_nfs3_retry_policy(struct rpc_task *task) +++{ +++ enum failover_policy_t policy = FAILOVER_NOACTION; +++ const struct rpc_procinfo *procinfo = task->tk_msg.rpc_proc; +++ u32 proc; +++ +++ if (unlikely(procinfo == NULL)) { +++ enfs_log_error("the task contains no valid proc.\n"); +++ return FAILOVER_NOACTION; +++ } +++ +++ proc = procinfo->p_proc; +++ +++ switch (proc) { +++ case NFS3PROC_CREATE: +++ case NFS3PROC_MKDIR: +++ case NFS3PROC_REMOVE: +++ case NFS3PROC_RMDIR: +++ case NFS3PROC_SYMLINK: +++ case NFS3PROC_LINK: +++ case NFS3PROC_SETATTR: +++ case NFS3PROC_WRITE: +++ policy = FAILOVER_RETRY_DELAY; +++ default: +++ policy = FAILOVER_RETRY; +++ } +++ return policy; +++} +++ +++static +++enum failover_policy_t failover_get_nfs4_retry_policy(struct rpc_task *task) +++{ +++ enum failover_policy_t policy = FAILOVER_NOACTION; +++ const struct rpc_procinfo *procinfo = task->tk_msg.rpc_proc; +++ u32 proc_idx; +++ +++ if (unlikely(procinfo == NULL)) { +++ enfs_log_error("the task contains no valid proc.\n"); +++ return FAILOVER_NOACTION; +++ } +++ +++ proc_idx = procinfo->p_statidx; +++ +++ switch (proc_idx) { +++ case NFSPROC4_CLNT_CREATE: +++ case NFSPROC4_CLNT_REMOVE: +++ case NFSPROC4_CLNT_LINK: +++ case NFSPROC4_CLNT_SYMLINK: +++ case NFSPROC4_CLNT_SETATTR: +++ case NFSPROC4_CLNT_WRITE: +++ case NFSPROC4_CLNT_RENAME: +++ case NFSPROC4_CLNT_SETACL: +++ policy = FAILOVER_RETRY_DELAY; +++ default: +++ policy = FAILOVER_RETRY; +++ } +++ return policy; +++} +++ +++static enum failover_policy_t failover_get_retry_policy(struct rpc_task *task) +++{ +++ struct rpc_clnt *clnt = task->tk_client; +++ u32 version = clnt->cl_vers; +++ enum failover_policy_t policy = FAILOVER_NOACTION; +++ +++ // 1. if the task meant to send to certain xprt, take no action +++ if (task->tk_flags & RPC_TASK_FIXED) +++ return FAILOVER_NOACTION; +++ +++ // 2. get policy by different version of nfs protocal +++ if (version == 3) // nfs v3 +++ policy = failover_get_nfs3_retry_policy(task); +++ else if (version == 4) // nfs v4 +++ policy = failover_get_nfs4_retry_policy(task); +++ else +++ return FAILOVER_NOACTION; +++ +++ // 3. if the task is not send to target, retry immediately +++ if (!RPC_WAS_SENT(task)) +++ policy = FAILOVER_RETRY; +++ +++ return policy; +++} +++ +++static int failover_check_task(struct rpc_task *task) +++{ +++ struct rpc_clnt *clnt = NULL; +++ int disable_mpath = enfs_get_config_multipath_state(); +++ +++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { +++ enfs_log_debug("Multipath is not enabled.\n"); +++ return -EINVAL; +++ } +++ +++ if (unlikely((task == NULL) || (task->tk_client == NULL))) { +++ enfs_log_error("The task is not valid.\n"); +++ return -EINVAL; +++ } +++ +++ clnt = task->tk_client; +++ +++ if (clnt->cl_prog != NFS_PROGRAM) { +++ enfs_log_debug("The clnt is not prog{%u} type.\n", +++ clnt->cl_prog); +++ return -EINVAL; +++ } +++ +++ if (!failover_is_enfs_clnt(clnt)) { +++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); +++ return -EINVAL; +++ } +++ return 0; +++} +++ +++void failover_handle(struct rpc_task *task) +++{ +++ enum failover_policy_t policy; +++ int ret; +++ +++ ret = failover_check_task(task); +++ if (ret != 0) +++ return; +++ +++ pm_set_path_state(task->tk_xprt, PM_STATE_FAULT); +++ +++ policy = failover_get_retry_policy(task); +++ +++ failover_retry_path_by_policy(task, policy); +++} +++ +++bool failover_task_need_call_start_again(struct rpc_task *task) +++{ +++ int ret; +++ +++ ret = failover_check_task(task); +++ if (ret != 0) +++ return false; +++ +++ return true; +++} +++ +++bool failover_prepare_transmit(struct rpc_task *task) +++{ +++ if (task->tk_flags & RPC_TASK_FIXED) +++ return true; +++ +++ if (pm_ping_is_test_xprt_task(task)) +++ return true; +++ +++ if (pm_get_path_state(task->tk_xprt) == PM_STATE_FAULT) { +++ task->tk_status = -ETIMEDOUT; +++ return false; +++ } +++ +++ return true; +++} ++diff --git a/fs/nfs/enfs/failover_path.h b/fs/nfs/enfs/failover_path.h ++new file mode 100644 ++index 000000000000..6f1294829a6e ++--- /dev/null +++++ b/fs/nfs/enfs/failover_path.h ++@@ -0,0 +1,17 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs path failover header file +++ * Author: y00583252 +++ * Create: 2023-08-02 +++ */ +++ +++#ifndef FAILOVER_PATH_H +++#define FAILOVER_PATH_H +++ +++#include <linux/sunrpc/sched.h> +++ +++void failover_handle(struct rpc_task *task); +++bool failover_prepare_transmit(struct rpc_task *task); +++ +++#endif // FAILOVER_PATH_H ++diff --git a/fs/nfs/enfs/failover_time.c b/fs/nfs/enfs/failover_time.c ++new file mode 100644 ++index 000000000000..866ea82d13fc ++--- /dev/null +++++ b/fs/nfs/enfs/failover_time.c ++@@ -0,0 +1,99 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: failover time file +++ * Create: 2023-08-02 +++ */ +++ +++#include "failover_time.h" +++#include <linux/jiffies.h> +++#include <linux/sunrpc/clnt.h> +++#include "enfs_config.h" +++#include "enfs_log.h" +++#include "failover_com.h" +++#include "pm_ping.h" +++ +++static unsigned long failover_get_mulitipath_timeout(struct rpc_clnt *clnt) +++{ +++ unsigned long config_tmo = enfs_get_config_multipath_timeout() * HZ; +++ unsigned long clnt_tmo = clnt->cl_timeout->to_initval; +++ +++ if (config_tmo == 0) +++ return clnt_tmo; +++ +++ return config_tmo > clnt_tmo ? clnt_tmo : config_tmo; +++} +++ +++void failover_adjust_task_timeout(struct rpc_task *task, void *condition) +++{ +++ struct rpc_clnt *clnt = NULL; +++ unsigned long tmo; +++ int disable_mpath = enfs_get_config_multipath_state(); +++ +++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { +++ enfs_log_debug("Multipath is not enabled.\n"); +++ return; +++ } +++ +++ clnt = task->tk_client; +++ if (unlikely(clnt == NULL)) { +++ enfs_log_error("task associate client is NULL.\n"); +++ return; +++ } +++ +++ if (!failover_is_enfs_clnt(clnt)) { +++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); +++ return; +++ } +++ +++ tmo = failover_get_mulitipath_timeout(clnt); +++ if (tmo == 0) { +++ enfs_log_debug("Multipath is not enabled.\n"); +++ return; +++ } +++ +++ if (task->tk_timeout != 0) +++ task->tk_timeout = +++ task->tk_timeout < tmo ? task->tk_timeout : tmo; +++ else +++ task->tk_timeout = tmo; +++} +++ +++void failover_init_task_req(struct rpc_task *task, struct rpc_rqst *req) +++{ +++ struct rpc_clnt *clnt = NULL; +++ int disable_mpath = enfs_get_config_multipath_state(); +++ +++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { +++ enfs_log_debug("Multipath is not enabled.\n"); +++ return; +++ } +++ +++ clnt = task->tk_client; +++ if (unlikely(clnt == NULL)) { +++ enfs_log_error("task associate client is NULL.\n"); +++ return; +++ } +++ +++ if (!failover_is_enfs_clnt(clnt)) { +++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); +++ return; +++ } +++ +++ if (!pm_ping_is_test_xprt_task(task)) +++ req->rq_timeout = failover_get_mulitipath_timeout(clnt); +++ else { +++ req->rq_timeout = enfs_get_config_path_detect_timeout() * HZ; +++ req->rq_majortimeo = req->rq_timeout + jiffies; +++ } +++ +++ /* +++ * when task is retried, the req is new, we lost major-timeout times, +++ * so we have to restore req major +++ * timeouts from the task, if it is stored. +++ */ +++ if (task->tk_major_timeo != 0) +++ req->rq_majortimeo = task->tk_major_timeo; +++ else +++ task->tk_major_timeo = req->rq_majortimeo; +++} ++diff --git a/fs/nfs/enfs/failover_time.h b/fs/nfs/enfs/failover_time.h ++new file mode 100644 ++index 000000000000..ede25b577a2a ++--- /dev/null +++++ b/fs/nfs/enfs/failover_time.h ++@@ -0,0 +1,16 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: failover time header file +++ * Create: 2023-08-02 +++ */ +++ +++#ifndef FAILOVER_TIME_H +++#define FAILOVER_TIME_H +++ +++#include <linux/sunrpc/sched.h> +++ +++void failover_adjust_task_timeout(struct rpc_task *task, void *condition); +++void failover_init_task_req(struct rpc_task *task, struct rpc_rqst *req); +++ +++#endif // FAILOVER_TIME_H ++diff --git a/fs/nfs/enfs/init.h b/fs/nfs/enfs/init.h ++new file mode 100644 ++index 000000000000..fdabb9084e19 ++--- /dev/null +++++ b/fs/nfs/enfs/init.h ++@@ -0,0 +1,17 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs client init +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++ +++#ifndef ENFS_INIT_H +++#define ENFS_INIT_H +++ +++#include <linux/types.h> +++ +++int32_t enfs_init(void); +++void enfs_fini(void); +++ +++#endif ++diff --git a/fs/nfs/enfs/mgmt_init.c b/fs/nfs/enfs/mgmt_init.c ++new file mode 100644 ++index 000000000000..75a40c5e0f6c ++--- /dev/null +++++ b/fs/nfs/enfs/mgmt_init.c ++@@ -0,0 +1,22 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: mgmt component init +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++ +++#include "mgmt_init.h" +++#include <linux/printk.h> +++#include "enfs_errcode.h" +++#include "enfs_config.h" +++ +++int32_t mgmt_init(void) +++{ +++ return enfs_config_timer_init(); +++} +++ +++void mgmt_fini(void) +++{ +++ enfs_config_timer_exit(); +++} ++diff --git a/fs/nfs/enfs/mgmt_init.h b/fs/nfs/enfs/mgmt_init.h ++new file mode 100644 ++index 000000000000..aa78303b9f01 ++--- /dev/null +++++ b/fs/nfs/enfs/mgmt_init.h ++@@ -0,0 +1,18 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: mgmt component init +++ * Author: y00583252 +++ * Create: 2023-07-31 +++ */ +++ +++#ifndef MGMT_INIT_H +++#define MGMT_INIT_H +++ +++#include <linux/types.h> +++ +++int32_t mgmt_init(void); +++void mgmt_fini(void); +++ +++ +++#endif // MGMT_INIT_H ++diff --git a/fs/nfs/enfs/pm_ping.c b/fs/nfs/enfs/pm_ping.c ++new file mode 100644 ++index 000000000000..24153cd4c7f3 ++--- /dev/null +++++ b/fs/nfs/enfs/pm_ping.c ++@@ -0,0 +1,421 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: path state header file +++ * Author: x00833432 +++ * Create: 2023-08-21 +++ */ +++ +++#include "pm_ping.h" +++#include <linux/err.h> +++#include <linux/spinlock.h> +++#include <linux/slab.h> +++#include <linux/module.h> +++#include <linux/printk.h> +++#include <linux/kthread.h> +++#include <linux/nfs.h> +++#include <linux/errno.h> +++#include <linux/rcupdate.h> +++#include <linux/workqueue.h> +++#include <net/netns/generic.h> +++#include <linux/atomic.h> +++#include <linux/sunrpc/clnt.h> +++ +++#include "../../../net/sunrpc/netns.h" +++#include "pm_state.h" +++#include "enfs.h" +++#include "enfs_log.h" +++#include "enfs_config.h" +++ +++#define SLEEP_INTERVAL 2 +++extern unsigned int sunrpc_net_id; +++ +++static struct task_struct *pm_ping_timer_thread; +++//protect pint_execute_workq +++static spinlock_t ping_execute_workq_lock; +++// timer for test xprt workqueue +++static struct workqueue_struct *ping_execute_workq; +++// count the ping xprt work on flight +++static atomic_t check_xprt_count; +++ +++struct ping_xprt_work { +++ struct rpc_xprt *xprt; // use this specific xprt +++ struct rpc_clnt *clnt; // use this specific rpc_client +++ struct work_struct ping_work; +++}; +++ +++struct pm_ping_async_callback { +++ void *data; +++ void (*func)(void *data); +++}; +++ +++// set xprt's enum pm_check_state +++void pm_ping_set_path_check_state(struct rpc_xprt *xprt, +++ enum pm_check_state state) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ +++ if (IS_ERR(xprt)) { +++ enfs_log_error("The xprt ptr is not exist.\n"); +++ return; +++ } +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return; +++ } +++ +++ xprt_get(xprt); +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (ctx == NULL) { +++ enfs_log_error("The xprt multipath ctx is not valid.\n"); +++ xprt_put(xprt); +++ return; +++ } +++ +++ atomic_set(&ctx->path_check_state, state); +++ xprt_put(xprt); +++} +++ +++// get xprt's enum pm_check_state +++static enum pm_check_state pm_ping_get_path_check_state(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ enum pm_check_state state; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return PM_CHECK_UNDEFINE; +++ } +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (ctx == NULL) { +++ enfs_log_error("The xprt multipath ctx is not valid.\n"); +++ return PM_CHECK_UNDEFINE; +++ } +++ +++ state = atomic_read(&ctx->path_check_state); +++ +++ return state; +++} +++ +++static void pm_ping_call_done_callback(void *data) +++{ +++ struct pm_ping_async_callback *callback_data = +++ (struct pm_ping_async_callback *)data; +++ +++ if (callback_data == NULL) +++ return; +++ +++ callback_data->func(callback_data->data); +++ +++ kfree(callback_data); +++} +++ +++// Default callback for async RPC calls +++static void pm_ping_call_done(struct rpc_task *task, void *data) +++{ +++ struct rpc_xprt *xprt = task->tk_xprt; +++ +++ atomic_dec(&check_xprt_count); +++ if (task->tk_status >= 0) +++ pm_set_path_state(xprt, PM_STATE_NORMAL); +++ else +++ pm_set_path_state(xprt, PM_STATE_FAULT); +++ +++ pm_ping_set_path_check_state(xprt, PM_CHECK_FINISH); +++ +++ pm_ping_call_done_callback(data); +++} +++ +++// register func to rpc_call_done +++static const struct rpc_call_ops pm_ping_set_status_ops = { +++ .rpc_call_done = pm_ping_call_done, +++}; +++ +++// execute work which in work_queue +++static void pm_ping_execute_work(struct work_struct *work) +++{ +++ int ret = 0; +++ +++ // get the work information +++ struct ping_xprt_work *work_info = +++ container_of(work, struct ping_xprt_work, ping_work); +++ +++ // if check state is pending +++ if (pm_ping_get_path_check_state(work_info->xprt) == PM_CHECK_WAITING) { +++ +++ pm_ping_set_path_check_state(work_info->xprt, +++ PM_CHECK_CHECKING); +++ +++ ret = rpc_clnt_test_xprt(work_info->clnt, +++ work_info->xprt, +++ &pm_ping_set_status_ops, +++ NULL, +++ RPC_TASK_ASYNC | RPC_TASK_FIXED); +++ +++ if (ret < 0) { +++ enfs_log_debug("ping xprt execute failed ,ret %d", ret); +++ +++ pm_ping_set_path_check_state(work_info->xprt, +++ PM_CHECK_FINISH); +++ +++ } else +++ atomic_inc(&check_xprt_count); +++ +++ } +++ +++ atomic_dec(&work_info->clnt->cl_count); +++ xprt_put(work_info->xprt); +++ kfree(work_info); +++ work_info = NULL; +++} +++ +++static bool pm_ping_workqueue_queue_work(struct work_struct *work) +++{ +++ bool ret = false; +++ +++ spin_lock(&ping_execute_workq_lock); +++ +++ if (ping_execute_workq != NULL) +++ ret = queue_work(ping_execute_workq, work); +++ +++ spin_unlock(&ping_execute_workq_lock); +++ return ret; +++} +++ +++// init test work and add this work to workqueue +++static int pm_ping_add_work(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, void *data) +++{ +++ struct ping_xprt_work *work_info; +++ bool ret = false; +++ +++ if (IS_ERR(xprt) || xprt == NULL) { +++ enfs_log_error("The xprt ptr is not exist.\n"); +++ return -EINVAL; +++ } +++ +++ if (IS_ERR(clnt) || clnt == NULL) { +++ enfs_log_error("The clnt ptr is not exist.\n"); +++ return -EINVAL; +++ } +++ +++ if (!xprt->multipath_context) { +++ enfs_log_error("multipath_context is null.\n"); +++ return -EINVAL; +++ } +++ +++ // check xprt pending status, if pending status equals Finish +++ // means this xprt can inster to work queue +++ if (pm_ping_get_path_check_state(xprt) == +++ PM_CHECK_FINISH || +++ pm_ping_get_path_check_state(xprt) == +++ PM_CHECK_INIT) { +++ +++ enfs_log_debug("find xprt pointer. %p\n", xprt); +++ work_info = kzalloc(sizeof(struct ping_xprt_work), GFP_ATOMIC); +++ if (work_info == NULL) +++ return -ENOMEM; +++ work_info->clnt = clnt; +++ atomic_inc(&clnt->cl_count); +++ work_info->xprt = xprt; +++ xprt_get(xprt); +++ INIT_WORK(&work_info->ping_work, pm_ping_execute_work); +++ pm_ping_set_path_check_state(xprt, PM_CHECK_WAITING); +++ +++ ret = pm_ping_workqueue_queue_work(&work_info->ping_work); +++ if (!ret) { +++ atomic_dec(&work_info->clnt->cl_count); +++ xprt_put(work_info->xprt); +++ kfree(work_info); +++ return -EINVAL; +++ } +++ } +++ return 0; +++} +++ +++// encapsulate pm_ping_add_work() +++static int pm_ping_execute_xprt_test(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, void *data) +++{ +++ pm_ping_add_work(clnt, xprt, NULL); +++ // return 0 for rpc_clnt_iterate_for_each_xprt(); +++ // because negative value will stop iterate all xprt +++ // and we need return negative value for debug +++ // Therefore, we need this function to iterate all xprt +++ return 0; +++} +++ +++// export to other module add ping work to workqueue +++int pm_ping_rpc_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) +++{ +++ int ret; +++ +++ ret = pm_ping_add_work(clnt, xprt, NULL); +++ return ret; +++} +++ +++// iterate xprt in the client +++static void pm_ping_loop_rpclnt(struct sunrpc_net *sn) +++{ +++ struct rpc_clnt *clnt; +++ +++ spin_lock(&sn->rpc_client_lock); +++ list_for_each_entry_rcu(clnt, &sn->all_clients, cl_clients) { +++ if (clnt->cl_enfs) { +++ enfs_log_debug("find rpc_clnt. %p\n", clnt); +++ rpc_clnt_iterate_for_each_xprt(clnt, +++ pm_ping_execute_xprt_test, NULL); +++ } +++ } +++ spin_unlock(&sn->rpc_client_lock); +++} +++ +++// iterate each clnt in the sunrpc_net +++static void pm_ping_loop_sunrpc_net(void) +++{ +++ struct net *net; +++ struct sunrpc_net *sn; +++ +++ rcu_read_lock(); +++ for_each_net_rcu(net) { +++ sn = net_generic(net, sunrpc_net_id); +++ if (sn == NULL) +++ continue; +++ pm_ping_loop_rpclnt(sn); +++ } +++ rcu_read_unlock(); +++} +++ +++static int pm_ping_routine(void *data) +++{ +++ while (!kthread_should_stop()) { +++ // equale 0 means open multipath +++ if (enfs_get_config_multipath_state() == +++ ENFS_MULTIPATH_ENABLE) +++ pm_ping_loop_sunrpc_net(); +++ +++ msleep((unsigned int) +++ enfs_get_config_path_detect_interval() * 1000); +++ } +++ return 0; +++} +++ +++// start thread to cycly ping +++static int pm_ping_start(void) +++{ +++ pm_ping_timer_thread = +++ kthread_run(pm_ping_routine, NULL, "pm_ping_routine"); +++ if (IS_ERR(pm_ping_timer_thread)) { +++ enfs_log_error("Failed to create kernel thread\n"); +++ return PTR_ERR(pm_ping_timer_thread); +++ } +++ return 0; +++} +++ +++// initialize workqueue +++static int pm_ping_workqueue_init(void) +++{ +++ struct workqueue_struct *queue = NULL; +++ +++ queue = create_workqueue("pm_ping_workqueue"); +++ +++ if (queue == NULL) { +++ enfs_log_error("create workqueue failed.\n"); +++ return -ENOMEM; +++ } +++ +++ spin_lock(&ping_execute_workq_lock); +++ ping_execute_workq = queue; +++ spin_unlock(&ping_execute_workq_lock); +++ enfs_log_info("create workqueue succeeeded.\n"); +++ return 0; +++} +++ +++static void pm_ping_workqueue_fini(void) +++{ +++ struct workqueue_struct *queue = NULL; +++ +++ spin_lock(&ping_execute_workq_lock); +++ queue = ping_execute_workq; +++ ping_execute_workq = NULL; +++ spin_unlock(&ping_execute_workq_lock); +++ +++ enfs_log_info("delete work queue\n"); +++ +++ if (queue != NULL) { +++ flush_workqueue(queue); +++ destroy_workqueue(queue); +++ } +++} +++ +++// module exit func +++void pm_ping_fini(void) +++{ +++ if (pm_ping_timer_thread) +++ kthread_stop(pm_ping_timer_thread); +++ +++ pm_ping_workqueue_fini(); +++ +++ while (atomic_read(&check_xprt_count) != 0) +++ msleep(SLEEP_INTERVAL); +++} +++ +++// module init func +++int pm_ping_init(void) +++{ +++ int ret; +++ +++ atomic_set(&check_xprt_count, 0); +++ ret = pm_ping_workqueue_init(); +++ if (ret != 0) { +++ enfs_log_error("PM_PING Module loading failed.\n"); +++ return ret; +++ } +++ ret = pm_ping_start(); +++ if (ret != 0) { +++ enfs_log_error("PM_PING Module loading failed.\n"); +++ pm_ping_workqueue_fini(); +++ return ret; +++ } +++ +++ return ret; +++} +++ +++bool pm_ping_is_test_xprt_task(struct rpc_task *task) +++{ +++ return task->tk_ops == &pm_ping_set_status_ops ? true : false; +++} +++ +++int pm_ping_rpc_test_xprt_with_callback(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void (*func)(void *data), +++ void *data) +++{ +++ int ret; +++ +++ struct pm_ping_async_callback *callback_data = +++ kzalloc(sizeof(struct pm_ping_async_callback), GFP_KERNEL); +++ +++ if (callback_data == NULL) { +++ enfs_log_error("failed to mzalloc mem\n"); +++ return -ENOMEM; +++ } +++ +++ callback_data->data = data; +++ callback_data->func = func; +++ atomic_inc(&check_xprt_count); +++ ret = rpc_clnt_test_xprt(clnt, xprt, +++ &pm_ping_set_status_ops, +++ callback_data, +++ RPC_TASK_ASYNC | RPC_TASK_FIXED); +++ +++ if (ret < 0) { +++ enfs_log_debug("ping xprt execute failed ,ret %d", ret); +++ atomic_dec(&check_xprt_count); +++ } +++ +++ return ret; +++} ++diff --git a/fs/nfs/enfs/pm_ping.h b/fs/nfs/enfs/pm_ping.h ++new file mode 100644 ++index 000000000000..6bcb94bfc836 ++--- /dev/null +++++ b/fs/nfs/enfs/pm_ping.h ++@@ -0,0 +1,33 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: nfs configuration +++ * Author: x00833432 +++ * Create: 2023-07-27 +++ */ +++ +++#ifndef PM_PING_H +++#define PM_PING_H +++ +++#include <linux/sunrpc/clnt.h> +++ +++enum pm_check_state { +++ PM_CHECK_INIT, // this xprt never been queued +++ PM_CHECK_WAITING, // this xprt waiting in the queue +++ PM_CHECK_CHECKING, // this xprt is testing +++ PM_CHECK_FINISH, // this xprt has been finished +++ PM_CHECK_UNDEFINE, // undefine multipath struct +++}; +++ +++int pm_ping_init(void); +++void pm_ping_fini(void); +++int pm_ping_rpc_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt); +++void pm_ping_set_path_check_state(struct rpc_xprt *xprt, +++ enum pm_check_state state); +++bool pm_ping_is_test_xprt_task(struct rpc_task *task); +++int pm_ping_rpc_test_xprt_with_callback(struct rpc_clnt *clnt, +++ struct rpc_xprt *xprt, +++ void (*func)(void *data), +++ void *data); +++ +++#endif // PM_PING_H ++diff --git a/fs/nfs/enfs/pm_state.c b/fs/nfs/enfs/pm_state.c ++new file mode 100644 ++index 000000000000..220621a207a2 ++--- /dev/null +++++ b/fs/nfs/enfs/pm_state.c ++@@ -0,0 +1,158 @@ +++// SPDX-License-Identifier: GPL-2.0 +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: path state file +++ * Author: y00583252 +++ * Create: 2023-08-12 +++ */ +++#include "pm_state.h" +++#include <linux/sunrpc/xprt.h> +++ +++#include "enfs.h" +++#include "enfs_log.h" +++ +++enum pm_path_state pm_get_path_state(struct rpc_xprt *xprt) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ enum pm_path_state state; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return PM_STATE_UNDEFINED; +++ } +++ +++ xprt_get(xprt); +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (ctx == NULL) { +++ enfs_log_error("The xprt multipath ctx is not valid.\n"); +++ xprt_put(xprt); +++ return PM_STATE_UNDEFINED; +++ } +++ +++ state = atomic_read(&ctx->path_state); +++ +++ xprt_put(xprt); +++ +++ return state; +++} +++ +++void pm_set_path_state(struct rpc_xprt *xprt, enum pm_path_state state) +++{ +++ struct enfs_xprt_context *ctx = NULL; +++ enum pm_path_state cur_state; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return; +++ } +++ +++ xprt_get(xprt); +++ +++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; +++ if (ctx == NULL) { +++ enfs_log_error("The xprt multipath ctx is not valid.\n"); +++ xprt_put(xprt); +++ return; +++ } +++ +++ cur_state = atomic_read(&ctx->path_state); +++ if (cur_state == state) { +++ enfs_log_debug("The xprt is already {%d}.\n", state); +++ xprt_put(xprt); +++ return; +++ } +++ +++ atomic_set(&ctx->path_state, state); +++ enfs_log_info("The xprt {%p} path state change from {%d} to {%d}.\n", +++ xprt, cur_state, state); +++ +++ xprt_put(xprt); +++} +++ +++void pm_get_path_state_desc(struct rpc_xprt *xprt, char *buf, int len) +++{ +++ enum pm_path_state state; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return; +++ } +++ +++ if ((buf == NULL) || (len <= 0)) { +++ enfs_log_error("Buffer is not valid, len=%d.\n", len); +++ return; +++ } +++ +++ state = pm_get_path_state(xprt); +++ +++ switch (state) { +++ case PM_STATE_INIT: +++ (void)snprintf(buf, len, "Init"); +++ break; +++ case PM_STATE_NORMAL: +++ (void)snprintf(buf, len, "Normal"); +++ break; +++ case PM_STATE_FAULT: +++ (void)snprintf(buf, len, "Fault"); +++ break; +++ default: +++ (void)snprintf(buf, len, "Unknown"); +++ break; +++ } +++} +++ +++void pm_get_xprt_state_desc(struct rpc_xprt *xprt, char *buf, int len) +++{ +++ int i; +++ unsigned long state; +++ static unsigned long xprt_mask[] = { +++ XPRT_LOCKED, XPRT_CONNECTED, +++ XPRT_CONNECTING, XPRT_CLOSE_WAIT, +++ XPRT_BOUND, XPRT_BINDING, XPRT_CLOSING, +++ XPRT_CONGESTED}; +++ +++ static const char *const xprt_state_desc[] = { +++ "LOCKED", "CONNECTED", "CONNECTING", +++ "CLOSE_WAIT", "BOUND", "BINDING", +++ "CLOSING", "CONGESTED"}; +++ int pos = 0; +++ int ret = 0; +++ +++ if (xprt == NULL) { +++ enfs_log_error("The xprt is not valid.\n"); +++ return; +++ } +++ +++ if ((buf == NULL) || (len <= 0)) { +++ enfs_log_error( +++ "Xprt state buffer is not valid, len=%d.\n", +++ len); +++ return; +++ } +++ +++ xprt_get(xprt); +++ state = READ_ONCE(xprt->state); +++ xprt_put(xprt); +++ +++ for (i = 0; i < ARRAY_SIZE(xprt_mask); ++i) { +++ if (pos >= len) +++ break; +++ +++ if (!test_bit(xprt_mask[i], &state)) +++ continue; +++ +++ if (pos == 0) +++ ret = snprintf(buf, len, "%s", xprt_state_desc[i]); +++ else +++ ret = snprintf(buf + pos, len - pos, "|%s", +++ xprt_state_desc[i]); +++ +++ if (ret < 0) { +++ enfs_log_error("format state failed, ret %d.\n", ret); +++ break; +++ } +++ +++ pos += ret; +++ } +++} ++diff --git a/fs/nfs/enfs/pm_state.h b/fs/nfs/enfs/pm_state.h ++new file mode 100644 ++index 000000000000..f5f52e5ab91d ++--- /dev/null +++++ b/fs/nfs/enfs/pm_state.h ++@@ -0,0 +1,28 @@ +++/* SPDX-License-Identifier: GPL-2.0 */ +++/* +++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. +++ * Description: path state header file +++ * Author: y00583252 +++ * Create: 2023-08-12 +++ */ +++ +++#ifndef PM_STATE_H +++#define PM_STATE_H +++ +++#include <linux/types.h> +++#include <linux/sunrpc/xprt.h> +++ +++enum pm_path_state { +++ PM_STATE_INIT, +++ PM_STATE_NORMAL, +++ PM_STATE_FAULT, +++ PM_STATE_UNDEFINED // xprt is not multipath xprt +++}; +++ +++void pm_set_path_state(struct rpc_xprt *xprt, enum pm_path_state state); +++enum pm_path_state pm_get_path_state(struct rpc_xprt *xprt); +++ +++void pm_get_path_state_desc(struct rpc_xprt *xprt, char *buf, int len); +++void pm_get_xprt_state_desc(struct rpc_xprt *xprt, char *buf, int len); +++ +++#endif // PM_STATE_H +diff --git a/0006-add_enfs_compile_option.patch b/0006-add_enfs_compile_option.patch +new file mode 100644 +index 0000000..ff3bc0e +--- /dev/null ++++ b/0006-add_enfs_compile_option.patch +@@ -0,0 +1,70 @@ ++diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig ++index b04256636d4b..ae53510c0627 100644 ++--- a/arch/arm64/configs/openeuler_defconfig +++++ b/arch/arm64/configs/openeuler_defconfig ++@@ -5344,6 +5344,7 @@ CONFIG_LOCKD=m ++ CONFIG_LOCKD_V4=y ++ CONFIG_NFS_ACL_SUPPORT=m ++ CONFIG_NFS_COMMON=y +++# CONFIG_ENFS is not set ++ CONFIG_SUNRPC=m ++ CONFIG_SUNRPC_GSS=m ++ CONFIG_SUNRPC_BACKCHANNEL=y ++diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig ++index 59baeb2973af..ccc317f7fdb2 100644 ++--- a/arch/x86/configs/openeuler_defconfig +++++ b/arch/x86/configs/openeuler_defconfig ++@@ -6825,6 +6825,7 @@ CONFIG_LOCKD=m ++ CONFIG_LOCKD_V4=y ++ CONFIG_NFS_ACL_SUPPORT=m ++ CONFIG_NFS_COMMON=y +++# CONFIG_ENFS is not set ++ CONFIG_SUNRPC=m ++ CONFIG_SUNRPC_GSS=m ++ CONFIG_SUNRPC_BACKCHANNEL=y ++diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig ++index e55f86713948..872c9b7671b1 100644 ++--- a/fs/nfs/Kconfig +++++ b/fs/nfs/Kconfig ++@@ -196,3 +196,14 @@ config NFS_DEBUG ++ depends on NFS_FS && SUNRPC_DEBUG ++ select CRC32 ++ default y +++ +++config ENFS +++ tristate "NFS client support for ENFS" +++ depends on NFS_FS +++ default n +++ help +++ This option enables support multipath of the NFS protocol +++ in the kernel's NFS client. +++ This feature will improve performance and reliability. +++ +++ If sure, say Y. ++diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile ++index c587e3c4c6a6..19d0ac2ba3b8 100644 ++--- a/fs/nfs/Makefile +++++ b/fs/nfs/Makefile ++@@ -12,6 +12,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \ ++ nfs-$(CONFIG_ROOT_NFS) += nfsroot.o ++ nfs-$(CONFIG_SYSCTL) += sysctl.o ++ nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o +++nfs-$(CONFIG_ENFS) += enfs_adapter.o ++ ++ obj-$(CONFIG_NFS_V2) += nfsv2.o ++ nfsv2-y := nfs2super.o proc.o nfs2xdr.o ++@@ -34,3 +35,5 @@ nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o ++ obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/ ++ obj-$(CONFIG_PNFS_BLOCK) += blocklayout/ ++ obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += flexfilelayout/ +++ +++obj-$(CONFIG_ENFS) += enfs/ ++diff --git a/net/sunrpc/Makefile b/net/sunrpc/Makefile ++index 090658c3da12..fe4e3b28c5d1 100644 ++--- a/net/sunrpc/Makefile +++++ b/net/sunrpc/Makefile ++@@ -19,3 +19,4 @@ sunrpc-$(CONFIG_SUNRPC_DEBUG) += debugfs.o ++ sunrpc-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel_rqst.o ++ sunrpc-$(CONFIG_PROC_FS) += stats.o ++ sunrpc-$(CONFIG_SYSCTL) += sysctl.o +++sunrpc-$(CONFIG_ENFS) += sunrpc_enfs_adapter.o +diff --git a/kernel.spec b/kernel.spec +index 3215446..e242c00 100644 +--- a/kernel.spec ++++ b/kernel.spec +@@ -60,6 +60,13 @@ Source9002: series.conf + Source9998: patches.tar.bz2 + %endif + ++Patch0001: 0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch ++Patch0002: 0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch ++Patch0003: 0003-add_enfs_module.patch ++Patch0004: 0004-add_enfs_module_for_sunrpc_multipatch.patch ++Patch0005: 0005-add_enfs_module_for_sunrpc_failover_and_configure.patch ++Patch0006: 0006-add_enfs_compile_option.patch ++ + #BuildRequires: + BuildRequires: module-init-tools, patch >= 2.5.4, bash >= 2.03, tar + BuildRequires: bzip2, xz, findutils, gzip, m4, perl, make >= 3.78, diffutils, gawk +@@ -256,6 +263,12 @@ Applypatches() + Applypatches series.conf %{_builddir}/kernel-%{version}/linux-%{KernelVer} + %endif + ++%patch0001 -p1 ++%patch0002 -p1 ++%patch0003 -p1 ++%patch0004 -p1 ++%patch0005 -p1 ++%patch0006 -p1 + touch .scmversion + + find . $ -name "*.orig" -o -name "*~" $ -exec rm -f {} \; >/dev/null +-- +2.25.0.windows.1 + diff --git a/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch b/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch new file mode 100644 index 0000000..38e57a9 --- /dev/null +++ b/0001-nfs_add_api_to_support_enfs_registe_and_handle_mount_option.patch @@ -0,0 +1,757 @@ +diff --git a/fs/nfs/client.c b/fs/nfs/client.c +index 7d02dc52209d..50820a8a684a 100644 +--- a/fs/nfs/client.c ++++ b/fs/nfs/client.c +@@ -48,7 +48,7 @@ + #include "callback.h" + #include "delegation.h" + #include "iostat.h" +-#include "internal.h" ++#include "enfs_adapter.h" + #include "fscache.h" + #include "pnfs.h" + #include "nfs.h" +@@ -255,6 +255,7 @@ void nfs_free_client(struct nfs_client *clp) + put_nfs_version(clp->cl_nfs_mod); + kfree(clp->cl_hostname); + kfree(clp->cl_acceptor); ++ nfs_free_multi_path_client(clp); + kfree(clp); + } + EXPORT_SYMBOL_GPL(nfs_free_client); +@@ -330,6 +331,9 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat + sap)) + continue; + ++ if (!nfs_multipath_client_match(clp, data)) ++ continue; ++ + refcount_inc(&clp->cl_count); + return clp; + } +@@ -512,6 +516,9 @@ int nfs_create_rpc_client(struct nfs_client *clp, + .program = &nfs_program, + .version = clp->rpc_ops->version, + .authflavor = flavor, ++#if IS_ENABLED(CONFIG_ENFS) ++ .multipath_option = cl_init->enfs_option, ++#endif + }; + + if (test_bit(NFS_CS_DISCRTRY, &clp->cl_flags)) +@@ -634,6 +641,13 @@ struct nfs_client *nfs_init_client(struct nfs_client *clp, + /* the client is already initialised */ + if (clp->cl_cons_state == NFS_CS_READY) + return clp; ++ error = nfs_create_multi_path_client(clp, cl_init); ++ if (error < 0) { ++ dprintk("%s: create failed.%d!\n", __func__, error); ++ nfs_put_client(clp); ++ clp = ERR_PTR(error); ++ return clp; ++ } + + /* + * Create a client RPC handle for doing FSSTAT with UNIX auth only +@@ -666,6 +680,9 @@ static int nfs_init_server(struct nfs_server *server, + .net = data->net, + .timeparms = &timeparms, + .init_flags = (1UL << NFS_CS_REUSEPORT), ++#if IS_ENABLED(CONFIG_ENFS) ++ .enfs_option = data->enfs_option, ++#endif + }; + struct nfs_client *clp; + int error; +diff --git a/fs/nfs/enfs_adapter.c b/fs/nfs/enfs_adapter.c +new file mode 100644 +index 000000000000..7f471f2072c4 +--- /dev/null ++++ b/fs/nfs/enfs_adapter.c +@@ -0,0 +1,230 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/types.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs3.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include <linux/sunrpc/sched.h> ++#include <linux/nfs_iostat.h> ++#include "enfs_adapter.h" ++#include "iostat.h" ++ ++struct enfs_adapter_ops __rcu *enfs_adapter; ++ ++int enfs_adapter_register(struct enfs_adapter_ops *ops) ++{ ++ struct enfs_adapter_ops *old; ++ ++ old = cmpxchg((struct enfs_adapter_ops **)&enfs_adapter, NULL, ops); ++ if (old == NULL || old == ops) ++ return 0; ++ pr_err("regist %s ops %p failed. old %p\n", __func__, ops, old); ++ return -EPERM; ++} ++EXPORT_SYMBOL_GPL(enfs_adapter_register); ++ ++int enfs_adapter_unregister(struct enfs_adapter_ops *ops) ++{ ++ struct enfs_adapter_ops *old; ++ ++ old = cmpxchg((struct enfs_adapter_ops **)&enfs_adapter, ops, NULL); ++ if (old == ops || old == NULL) ++ return 0; ++ pr_err("unregist %s ops %p failed. old %p\n", __func__, ops, old); ++ return -EPERM; ++} ++EXPORT_SYMBOL_GPL(enfs_adapter_unregister); ++ ++struct enfs_adapter_ops *nfs_multipath_router_get(void) ++{ ++ struct enfs_adapter_ops *ops; ++ ++ rcu_read_lock(); ++ ops = rcu_dereference(enfs_adapter); ++ if (ops == NULL) { ++ rcu_read_unlock(); ++ return NULL; ++ } ++ if (!try_module_get(ops->owner)) ++ ops = NULL; ++ rcu_read_unlock(); ++ return ops; ++} ++ ++void nfs_multipath_router_put(struct enfs_adapter_ops *ops) ++{ ++ if (ops) ++ module_put(ops->owner); ++} ++ ++bool is_valid_option(enum nfsmultipathoptions option) ++{ ++ if (option < REMOTEADDR || option >= INVALID_OPTION) { ++ pr_warn("%s: ENFS invalid option %d\n", __func__, option); ++ return false; ++ } ++ ++ return true; ++} ++ ++int enfs_parse_mount_options(enum nfsmultipathoptions option, char *str, ++ struct nfs_parsed_mount_data *mnt) ++{ ++ ++ //parseMultiPathOptions(getNfsMultiPathOpt(token), string, mnt); ++ ++ int rc; ++ struct enfs_adapter_ops *ops; ++ ++ ops = nfs_multipath_router_get(); ++ if ((ops == NULL) || (ops->parse_mount_options == NULL) || ++ !is_valid_option(option)) { ++ nfs_multipath_router_put(ops); ++ dfprintk(MOUNT, ++ "NFS: parsing nfs mount option enfs not load[%s]\n" ++ , __func__); ++ return -EOPNOTSUPP; ++ } ++ // nfs_multipath_parse_options ++ dfprintk(MOUNT, "NFS: parsing nfs mount option '%s' type: %d[%s]\n" ++ , str, option, __func__); ++ rc = ops->parse_mount_options(option, str, &mnt->enfs_option, mnt->net); ++ nfs_multipath_router_put(ops); ++ return rc; ++} ++ ++void enfs_free_mount_options(struct nfs_parsed_mount_data *data) ++{ ++ struct enfs_adapter_ops *ops; ++ ++ if (data->enfs_option == NULL) ++ return; ++ ++ ops = nfs_multipath_router_get(); ++ if ((ops == NULL) || (ops->free_mount_options == NULL)) { ++ nfs_multipath_router_put(ops); ++ return; ++ } ++ ops->free_mount_options((void *)&data->enfs_option); ++ nfs_multipath_router_put(ops); ++} ++ ++int nfs_create_multi_path_client(struct nfs_client *client, ++ const struct nfs_client_initdata *cl_init) ++{ ++ int ret = 0; ++ struct enfs_adapter_ops *ops; ++ ++ if (cl_init->enfs_option == NULL) ++ return 0; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->client_info_init != NULL) ++ ret = ops->client_info_init( ++ (void *)&client->cl_multipath_data, cl_init); ++ nfs_multipath_router_put(ops); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(nfs_create_multi_path_client); ++ ++void nfs_free_multi_path_client(struct nfs_client *clp) ++{ ++ struct enfs_adapter_ops *ops; ++ ++ if (clp->cl_multipath_data == NULL) ++ return; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->client_info_free != NULL) ++ ops->client_info_free(clp->cl_multipath_data); ++ nfs_multipath_router_put(ops); ++} ++ ++int nfs_multipath_client_match(struct nfs_client *clp, ++ const struct nfs_client_initdata *sap) ++{ ++ int ret = true; ++ struct enfs_adapter_ops *ops; ++ ++ pr_info("%s src %p dst %p\n.", __func__, ++ clp->cl_multipath_data, sap->enfs_option); ++ ++ if (clp->cl_multipath_data == NULL && sap->enfs_option == NULL) ++ return true; ++ ++ if ((clp->cl_multipath_data == NULL && sap->enfs_option) || ++ (clp->cl_multipath_data && sap->enfs_option == NULL)) { ++ pr_err("not match client src %p dst %p\n.", ++ clp->cl_multipath_data, sap->enfs_option); ++ return false; ++ } ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->client_info_match != NULL) ++ ret = ops->client_info_match(clp->cl_multipath_data, ++ sap->enfs_option); ++ nfs_multipath_router_put(ops); ++ ++ return ret; ++} ++ ++int nfs4_multipath_client_match(struct nfs_client *src, struct nfs_client *dst) ++{ ++ int ret = true; ++ struct enfs_adapter_ops *ops; ++ ++ if (src->cl_multipath_data == NULL && dst->cl_multipath_data == NULL) ++ return true; ++ ++ if (src->cl_multipath_data == NULL || dst->cl_multipath_data == NULL) ++ return false; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->nfs4_client_info_match != NULL) ++ ret = ops->nfs4_client_info_match(src->cl_multipath_data, ++ src->cl_multipath_data); ++ nfs_multipath_router_put(ops); ++ ++ return ret; ++} ++EXPORT_SYMBOL_GPL(nfs4_multipath_client_match); ++ ++void nfs_multipath_show_client_info(struct seq_file *mount_option, ++ struct nfs_server *server) ++{ ++ struct enfs_adapter_ops *ops; ++ ++ if (mount_option == NULL || server == NULL || ++ server->client == NULL || ++ server->nfs_client->cl_multipath_data == NULL) ++ return; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->client_info_show != NULL) ++ ops->client_info_show(mount_option, server); ++ nfs_multipath_router_put(ops); ++} ++ ++int nfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option) ++{ ++ int ret = 0; ++ struct enfs_adapter_ops *ops; ++ ++ if (nfs_client == NULL || nfs_client->cl_rpcclient == NULL) ++ return 0; ++ ++ ops = nfs_multipath_router_get(); ++ if (ops != NULL && ops->remount_ip_list != NULL) ++ ret = ops->remount_ip_list(nfs_client, enfs_option); ++ nfs_multipath_router_put(ops); ++ return ret; ++} ++EXPORT_SYMBOL_GPL(nfs_remount_iplist); +diff --git a/fs/nfs/enfs_adapter.h b/fs/nfs/enfs_adapter.h +new file mode 100644 +index 000000000000..752544e18056 +--- /dev/null ++++ b/fs/nfs/enfs_adapter.h +@@ -0,0 +1,101 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS adapt header. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _NFS_MULTIPATH_H_ ++#define _NFS_MULTIPATH_H_ ++ ++#include "internal.h" ++ ++#if IS_ENABLED(CONFIG_ENFS) ++enum nfsmultipathoptions { ++ REMOTEADDR, ++ LOCALADDR, ++ REMOTEDNSNAME, ++ REMOUNTREMOTEADDR, ++ REMOUNTLOCALADDR, ++ INVALID_OPTION ++}; ++ ++ ++struct enfs_adapter_ops { ++ const char *name; ++ struct module *owner; ++ int (*parse_mount_options)(enum nfsmultipathoptions option, ++ char *str, void **enfs_option, struct net *net_ns); ++ ++ void (*free_mount_options)(void **data); ++ ++ int (*client_info_init)(void **data, ++ const struct nfs_client_initdata *cl_init); ++ void (*client_info_free)(void *data); ++ int (*client_info_match)(void *src, void *dst); ++ int (*nfs4_client_info_match)(void *src, void *dst); ++ void (*client_info_show)(struct seq_file *mount_option, void *data); ++ int (*remount_ip_list)(struct nfs_client *nfs_client, ++ void *enfs_option); ++}; ++ ++int enfs_parse_mount_options(enum nfsmultipathoptions option, char *str, ++ struct nfs_parsed_mount_data *mnt); ++void enfs_free_mount_options(struct nfs_parsed_mount_data *data); ++int nfs_create_multi_path_client(struct nfs_client *client, ++ const struct nfs_client_initdata *cl_init); ++void nfs_free_multi_path_client(struct nfs_client *clp); ++int nfs_multipath_client_match(struct nfs_client *clp, ++ const struct nfs_client_initdata *sap); ++int nfs4_multipath_client_match(struct nfs_client *src, struct nfs_client *dst); ++void nfs_multipath_show_client_info(struct seq_file *mount_option, ++ struct nfs_server *server); ++int enfs_adapter_register(struct enfs_adapter_ops *ops); ++int enfs_adapter_unregister(struct enfs_adapter_ops *ops); ++int nfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option); ++int nfs4_create_multi_path(struct nfs_server *server, ++ struct nfs_parsed_mount_data *data, ++ const struct rpc_timeout *timeparms); ++ ++#else ++static inline ++void nfs_free_multi_path_client(struct nfs_client *clp) ++{ ++ ++} ++ ++static inline ++int nfs_multipath_client_match(struct nfs_client *clp, ++ const struct nfs_client_initdata *sap) ++{ ++ return 1; ++} ++ ++static inline ++int nfs_create_multi_path_client(struct nfs_client *client, ++ const struct nfs_client_initdata *cl_init) ++{ ++ return 0; ++} ++ ++static inline ++void nfs_multipath_show_client_info(struct seq_file *mount_option, ++ struct nfs_server *server) ++{ ++ ++} ++ ++static inline ++int nfs4_multipath_client_match(struct nfs_client *src, ++ struct nfs_client *dst) ++{ ++ return 1; ++} ++ ++static inline ++void enfs_free_mount_options(struct nfs_parsed_mount_data *data) ++{ ++ ++} ++ ++#endif // CONFIG_ENFS ++#endif // _NFS_MULTIPATH_H_ +diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h +index 0ce5a90640c4..c696693edc7b 100644 +--- a/fs/nfs/internal.h ++++ b/fs/nfs/internal.h +@@ -93,6 +93,9 @@ struct nfs_client_initdata { + u32 minorversion; + struct net *net; + const struct rpc_timeout *timeparms; ++#if IS_ENABLED(CONFIG_ENFS) ++ void *enfs_option; /* struct multipath_mount_options * */ ++#endif + }; + + /* +@@ -135,6 +138,9 @@ struct nfs_parsed_mount_data { + + struct security_mnt_opts lsm_opts; + struct net *net; ++#if IS_ENABLED(CONFIG_ENFS) ++ void *enfs_option; /* struct multipath_mount_options * */ ++#endif + }; + + /* mount_clnt.c */ +diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c +index 1350ea673672..4aa6e1f961f7 100644 +--- a/fs/nfs/nfs4client.c ++++ b/fs/nfs/nfs4client.c +@@ -10,7 +10,7 @@ + #include <linux/sunrpc/xprt.h> + #include <linux/sunrpc/bc_xprt.h> + #include <linux/sunrpc/rpc_pipe_fs.h> +-#include "internal.h" ++#include "enfs_adapter.h" + #include "callback.h" + #include "delegation.h" + #include "nfs4session.h" +@@ -225,6 +225,16 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init) + __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags); + __set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags); + ++#if IS_ENABLED(CONFIG_ENFS) ++ err = nfs_create_multi_path_client(clp, cl_init); ++ if (err < 0) { ++ dprintk("%s: create failed.%d\n", __func__, err); ++ nfs_put_client(clp); ++ clp = ERR_PTR(err); ++ return clp; ++ } ++#endif ++ + /* + * Set up the connection to the server before we add add to the + * global list. +@@ -529,6 +539,9 @@ static int nfs4_match_client(struct nfs_client *pos, struct nfs_client *new, + if (!nfs4_match_client_owner_id(pos, new)) + return 1; + ++ if (!nfs4_multipath_client_match(pos, new)) ++ return 1; ++ + return 0; + } + +@@ -860,7 +873,7 @@ static int nfs4_set_client(struct nfs_server *server, + const size_t addrlen, + const char *ip_addr, + int proto, const struct rpc_timeout *timeparms, +- u32 minorversion, struct net *net) ++ u32 minorversion, struct net *net, void *enfs_option) + { + struct nfs_client_initdata cl_init = { + .hostname = hostname, +@@ -872,6 +885,9 @@ static int nfs4_set_client(struct nfs_server *server, + .minorversion = minorversion, + .net = net, + .timeparms = timeparms, ++#if IS_ENABLED(CONFIG_ENFS) ++ .enfs_option = enfs_option, ++#endif + }; + struct nfs_client *clp; + +@@ -1042,6 +1058,30 @@ static int nfs4_server_common_setup(struct nfs_server *server, + return error; + } + ++int nfs4_create_multi_path(struct nfs_server *server, ++ struct nfs_parsed_mount_data *data, ++ const struct rpc_timeout *timeparms) ++{ ++ struct nfs_client_initdata cl_init = { ++ .hostname = data->nfs_server.hostname, ++ .addr = (const struct sockaddr *)&data->nfs_server.address, ++ .addrlen = data->nfs_server.addrlen, ++ .ip_addr = data->client_address, ++ .nfs_mod = &nfs_v4, ++ .proto = data->nfs_server.protocol, ++ .minorversion = data->minorversion, ++ .net = data->net, ++ .timeparms = timeparms, ++#if IS_ENABLED(CONFIG_ENFS) ++ .enfs_option = data->enfs_option, ++#endif // CONFIG_ENFS ++ }; ++ ++ return nfs_create_multi_path_client(server->nfs_client, &cl_init); ++ ++} ++EXPORT_SYMBOL_GPL(nfs4_create_multi_path); ++ + /* + * Create a version 4 volume record + */ +@@ -1050,6 +1090,7 @@ static int nfs4_init_server(struct nfs_server *server, + { + struct rpc_timeout timeparms; + int error; ++ void *enfs_option = NULL; + + nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, + data->timeo, data->retrans); +@@ -1067,6 +1108,10 @@ static int nfs4_init_server(struct nfs_server *server, + else + data->selected_flavor = RPC_AUTH_UNIX; + ++#if IS_ENABLED(CONFIG_ENFS) ++ enfs_option = data->enfs_option; ++#endif ++ + /* Get a client record */ + error = nfs4_set_client(server, + data->nfs_server.hostname, +@@ -1076,7 +1121,7 @@ static int nfs4_init_server(struct nfs_server *server, + data->nfs_server.protocol, + &timeparms, + data->minorversion, +- data->net); ++ data->net, enfs_option); + if (error < 0) + return error; + +@@ -1161,7 +1206,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, + XPRT_TRANSPORT_RDMA, + parent_server->client->cl_timeout, + parent_client->cl_mvops->minor_version, +- parent_client->cl_net); ++ parent_client->cl_net, NULL); + if (!error) + goto init_server; + #endif /* IS_ENABLED(CONFIG_SUNRPC_XPRT_RDMA) */ +@@ -1174,7 +1219,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, + XPRT_TRANSPORT_TCP, + parent_server->client->cl_timeout, + parent_client->cl_mvops->minor_version, +- parent_client->cl_net); ++ parent_client->cl_net, NULL); + if (error < 0) + goto error; + +@@ -1269,7 +1314,7 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname, + set_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status); + error = nfs4_set_client(server, hostname, sap, salen, buf, + clp->cl_proto, clnt->cl_timeout, +- clp->cl_minorversion, net); ++ clp->cl_minorversion, net, NULL); + clear_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status); + if (error != 0) { + nfs_server_insert_lists(server); +diff --git a/fs/nfs/super.c b/fs/nfs/super.c +index a05e1eb2c3fd..83cd294aca15 100644 +--- a/fs/nfs/super.c ++++ b/fs/nfs/super.c +@@ -61,7 +61,7 @@ + #include "callback.h" + #include "delegation.h" + #include "iostat.h" +-#include "internal.h" ++#include "enfs_adapter.h" + #include "fscache.h" + #include "nfs4session.h" + #include "pnfs.h" +@@ -113,6 +113,12 @@ enum { + + /* Special mount options */ + Opt_userspace, Opt_deprecated, Opt_sloppy, ++#if IS_ENABLED(CONFIG_ENFS) ++ Opt_remote_iplist, ++ Opt_local_iplist, ++ Opt_remote_dnslist, ++ Opt_enfs_info, ++#endif + + Opt_err + }; +@@ -183,6 +189,13 @@ static const match_table_t nfs_mount_option_tokens = { + { Opt_fscache_uniq, "fsc=%s" }, + { Opt_local_lock, "local_lock=%s" }, + ++#if IS_ENABLED(CONFIG_ENFS) ++ { Opt_remote_iplist, "remoteaddrs=%s" }, ++ { Opt_local_iplist, "localaddrs=%s" }, ++ { Opt_remote_dnslist, "remotednsname=%s" }, ++ { Opt_enfs_info, "enfs_info=%s" }, ++#endif ++ + /* The following needs to be listed after all other options */ + { Opt_nfsvers, "v%s" }, + +@@ -365,6 +378,21 @@ static struct shrinker acl_shrinker = { + .seeks = DEFAULT_SEEKS, + }; + ++#if IS_ENABLED(CONFIG_ENFS) ++enum nfsmultipathoptions getNfsMultiPathOpt(int token) ++{ ++ switch (token) { ++ case Opt_remote_iplist: ++ return REMOUNTREMOTEADDR; ++ case Opt_local_iplist: ++ return REMOUNTLOCALADDR; ++ case Opt_remote_dnslist: ++ return REMOTEDNSNAME; ++ } ++ return INVALID_OPTION; ++} ++#endif ++ + /* + * Register the NFS filesystems + */ +@@ -758,6 +786,9 @@ int nfs_show_options(struct seq_file *m, struct dentry *root) + seq_printf(m, ",addr=%s", + rpc_peeraddr2str(nfss->nfs_client->cl_rpcclient, + RPC_DISPLAY_ADDR)); ++ ++ nfs_multipath_show_client_info(m, nfss); ++ + rcu_read_unlock(); + + return 0; +@@ -853,6 +884,8 @@ int nfs_show_stats(struct seq_file *m, struct dentry *root) + seq_puts(m, root->d_sb->s_flags & SB_NODIRATIME ? ",nodiratime" : ""); + nfs_show_mount_options(m, nfss, 1); + ++ nfs_multipath_show_client_info(m, nfss); ++ + seq_printf(m, "\n\tage:\t%lu", (jiffies - nfss->mount_time) / HZ); + + show_implementation_id(m, nfss); +@@ -977,6 +1010,7 @@ static void nfs_free_parsed_mount_data(struct nfs_parsed_mount_data *data) + kfree(data->nfs_server.export_path); + kfree(data->nfs_server.hostname); + kfree(data->fscache_uniq); ++ enfs_free_mount_options(data); + security_free_mnt_opts(&data->lsm_opts); + kfree(data); + } +@@ -1641,7 +1675,34 @@ static int nfs_parse_mount_options(char *raw, + return 0; + }; + break; +- ++#if IS_ENABLED(CONFIG_ENFS) ++ case Opt_remote_iplist: ++ case Opt_local_iplist: ++ case Opt_remote_dnslist: ++ string = match_strdup(args); ++ if (string == NULL) ++ goto out_nomem; ++ rc = enfs_parse_mount_options(getNfsMultiPathOpt(token), ++ string, mnt); ++ kfree(string); ++ switch (rc) { ++ case 0: ++ break; ++ case -ENOMEM: ++ goto out_nomem; ++ case -ENOSPC: ++ goto out_limit; ++ case -EINVAL: ++ goto out_invalid_address; ++ case -ENOTSUPP: ++ goto out_invalid_address; ++ case -EOPNOTSUPP: ++ goto out_invalid_address; ++ } ++ break; ++ case Opt_enfs_info: ++ break; ++#endif + /* + * Special options + */ +@@ -1720,6 +1781,11 @@ static int nfs_parse_mount_options(char *raw, + free_secdata(secdata); + printk(KERN_INFO "NFS: security options invalid: %d\n", rc); + return 0; ++#if IS_ENABLED(CONFIG_ENFS) ++out_limit: ++ dprintk("NFS: param is more than supported limit: %d\n", rc); ++ return 0; ++#endif + } + + /* +@@ -2335,6 +2401,14 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) + if (!nfs_parse_mount_options((char *)options, data)) + goto out; + ++#if IS_ENABLED(CONFIG_ENFS) ++ if (data->enfs_option) { ++ error = nfs_remount_iplist(nfss->nfs_client, data->enfs_option); ++ if (error) ++ goto out; ++ } ++#endif ++ + /* + * noac is a special case. It implies -o sync, but that's not + * necessarily reflected in the mtab options. do_remount_sb +@@ -2347,6 +2421,11 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) + /* compare new mount options with old ones */ + error = nfs_compare_remount_data(nfss, data); + out: ++#if IS_ENABLED(CONFIG_ENFS) ++ /* release remount option member */ ++ if (data->enfs_option) ++ enfs_free_mount_options(data); ++#endif + nfs_free_parsed_mount_data(data); + return error; + } +diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h +index 7023ae64e3d7..2c19678afe8d 100644 +--- a/include/linux/nfs_fs_sb.h ++++ b/include/linux/nfs_fs_sb.h +@@ -123,6 +123,11 @@ struct nfs_client { + + struct net *cl_net; + struct list_head pending_cb_stateids; ++ ++#if IS_ENABLED(CONFIG_ENFS) ++ /* multi path private structure (struct multipath_client_info *) */ ++ void *cl_multipath_data; ++#endif + }; + + /* diff --git a/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch b/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch new file mode 100644 index 0000000..540a2ce --- /dev/null +++ b/0002-sunrpc_add_api_to_support_enfs_registe_and_create_multipath_then_dispatch_IO.patch @@ -0,0 +1,805 @@ +diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h +index 8aa865bce4f6..89178f78de8c 100644 +--- a/include/linux/sunrpc/clnt.h ++++ b/include/linux/sunrpc/clnt.h +@@ -70,6 +70,10 @@ struct rpc_clnt { + struct dentry *cl_debugfs; /* debugfs directory */ + #endif + struct rpc_xprt_iter cl_xpi; ++ ++#if IS_ENABLED(CONFIG_ENFS) ++ bool cl_enfs; ++#endif + }; + + /* +@@ -124,6 +128,9 @@ struct rpc_create_args { + unsigned long flags; + char *client_name; + struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */ ++#if IS_ENABLED(CONFIG_ENFS) ++ void *multipath_option; ++#endif + }; + + struct rpc_add_xprt_test { +@@ -221,6 +228,12 @@ bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt, + const struct sockaddr *sap); + void rpc_cleanup_clids(void); + ++#if IS_ENABLED(CONFIG_ENFS) ++int ++rpc_clnt_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt, ++ const struct rpc_call_ops *ops, void *data, int flags); ++#endif /* CONFIG_ENFS */ ++ + static inline int rpc_reply_expected(struct rpc_task *task) + { + return (task->tk_msg.rpc_proc != NULL) && +diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h +index ad2e243f3f03..124f5a0faf3e 100644 +--- a/include/linux/sunrpc/sched.h ++++ b/include/linux/sunrpc/sched.h +@@ -90,6 +90,9 @@ struct rpc_task { + tk_garb_retry : 2, + tk_cred_retry : 2, + tk_rebind_retry : 2; ++#if IS_ENABLED(CONFIG_ENFS) ++ unsigned long tk_major_timeo; /* major timeout ticks */ ++#endif + }; + + typedef void (*rpc_action)(struct rpc_task *); +@@ -118,6 +121,9 @@ struct rpc_task_setup { + */ + #define RPC_TASK_ASYNC 0x0001 /* is an async task */ + #define RPC_TASK_SWAPPER 0x0002 /* is swapping in/out */ ++#if IS_ENABLED(CONFIG_ENFS) ++#define RPC_TASK_FIXED 0x0004 /* detect xprt status task */ ++#endif + #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ + #define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ + #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ +@@ -257,6 +263,9 @@ void rpc_destroy_mempool(void); + extern struct workqueue_struct *rpciod_workqueue; + extern struct workqueue_struct *xprtiod_workqueue; + void rpc_prepare_task(struct rpc_task *task); ++#if IS_ENABLED(CONFIG_ENFS) ++void rpc_init_task_retry_counters(struct rpc_task *task); ++#endif + + static inline int rpc_wait_for_completion_task(struct rpc_task *task) + { +diff --git a/include/linux/sunrpc/sunrpc_enfs_adapter.h b/include/linux/sunrpc/sunrpc_enfs_adapter.h +new file mode 100644 +index 000000000000..28abedcf5cf6 +--- /dev/null ++++ b/include/linux/sunrpc/sunrpc_enfs_adapter.h +@@ -0,0 +1,128 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* Client-side SUNRPC ENFS adapter header. ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _SUNRPC_ENFS_ADAPTER_H_ ++#define _SUNRPC_ENFS_ADAPTER_H_ ++#include <linux/sunrpc/clnt.h> ++ ++#if IS_ENABLED(CONFIG_ENFS) ++ ++static inline void rpc_xps_nactive_add_one(struct rpc_xprt_switch *xps) ++{ ++ xps->xps_nactive--; ++} ++ ++static inline void rpc_xps_nactive_sub_one(struct rpc_xprt_switch *xps) ++{ ++ xps->xps_nactive--; ++} ++ ++struct rpc_xprt *rpc_task_get_xprt ++(struct rpc_clnt *clnt, struct rpc_xprt *xprt); ++ ++struct rpc_multipath_ops { ++ struct module *owner; ++ void (*create_clnt)(struct rpc_create_args *args, ++ struct rpc_clnt *clnt); ++ void (*releas_clnt)(struct rpc_clnt *clnt); ++ void (*create_xprt)(struct rpc_xprt *xprt); ++ void (*destroy_xprt)(struct rpc_xprt *xprt); ++ void (*xprt_iostat)(struct rpc_task *task); ++ void (*failover_handle)(struct rpc_task *task); ++ bool (*task_need_call_start_again)(struct rpc_task *task); ++ void (*adjust_task_timeout)(struct rpc_task *task, void *condition); ++ void (*init_task_req)(struct rpc_task *task, struct rpc_rqst *req); ++ bool (*prepare_transmit)(struct rpc_task *task); ++}; ++ ++extern struct rpc_multipath_ops __rcu *multipath_ops; ++void rpc_init_task_retry_counters(struct rpc_task *task); ++int rpc_multipath_ops_register(struct rpc_multipath_ops *ops); ++int rpc_multipath_ops_unregister(struct rpc_multipath_ops *ops); ++struct rpc_multipath_ops *rpc_multipath_ops_get(void); ++void rpc_multipath_ops_put(struct rpc_multipath_ops *ops); ++void rpc_task_release_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt); ++void rpc_multipath_ops_create_clnt(struct rpc_create_args *args, ++ struct rpc_clnt *clnt); ++void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt); ++bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt); ++void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt); ++void rpc_multipath_ops_xprt_iostat(struct rpc_task *task); ++void rpc_multipath_ops_failover_handle(struct rpc_task *task); ++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task); ++void rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, ++ void *condition); ++void rpc_multipath_ops_init_task_req(struct rpc_task *task, ++ struct rpc_rqst *req); ++bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task); ++ ++#else ++static inline struct rpc_xprt *rpc_task_get_xprt(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt) ++{ ++ return NULL; ++} ++ ++static inline void rpc_task_release_xprt(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt) ++{ ++} ++ ++static inline void rpc_xps_nactive_add_one(struct rpc_xprt_switch *xps) ++{ ++} ++ ++static inline void rpc_xps_nactive_sub_one(struct rpc_xprt_switch *xps) ++{ ++} ++ ++static inline void rpc_multipath_ops_create_clnt ++(struct rpc_create_args *args, struct rpc_clnt *clnt) ++{ ++} ++ ++static inline void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt) ++{ ++} ++ ++static inline bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt) ++{ ++ return false; ++} ++ ++static inline void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt) ++{ ++} ++ ++static inline void rpc_multipath_ops_xprt_iostat(struct rpc_task *task) ++{ ++} ++ ++static inline void rpc_multipath_ops_failover_handle(struct rpc_task *task) ++{ ++} ++ ++static inline ++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task) ++{ ++ return false; ++} ++ ++static inline void ++rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, void *condition) ++{ ++} ++ ++static inline void ++rpc_multipath_ops_init_task_req(struct rpc_task *task, struct rpc_rqst *req) ++{ ++} ++ ++static inline bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task) ++{ ++ return false; ++} ++ ++#endif ++#endif // _SUNRPC_ENFS_ADAPTER_H_ +diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h +index ccfacca1eba9..2e47b3577947 100644 +--- a/include/linux/sunrpc/xprt.h ++++ b/include/linux/sunrpc/xprt.h +@@ -279,6 +279,10 @@ struct rpc_xprt { + atomic_t inject_disconnect; + #endif + struct rcu_head rcu; ++#if IS_ENABLED(CONFIG_ENFS) ++ atomic_long_t queuelen; ++ void *multipath_context; ++#endif + }; + + #if defined(CONFIG_SUNRPC_BACKCHANNEL) +diff --git a/include/linux/sunrpc/xprtmultipath.h b/include/linux/sunrpc/xprtmultipath.h +index af1257c030d2..d54e4dbbbf34 100644 +--- a/include/linux/sunrpc/xprtmultipath.h ++++ b/include/linux/sunrpc/xprtmultipath.h +@@ -22,6 +22,10 @@ struct rpc_xprt_switch { + const struct rpc_xprt_iter_ops *xps_iter_ops; + + struct rcu_head xps_rcu; ++#if IS_ENABLED(CONFIG_ENFS) ++ unsigned int xps_nactive; ++ atomic_long_t xps_queuelen; ++#endif + }; + + struct rpc_xprt_iter { +@@ -69,4 +73,8 @@ extern struct rpc_xprt *xprt_iter_get_next(struct rpc_xprt_iter *xpi); + + extern bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps, + const struct sockaddr *sap); ++#if IS_ENABLED(CONFIG_ENFS) ++extern void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, ++ struct rpc_xprt *xprt); ++#endif + #endif +diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c +index 0fc540b0d183..d7ffee637148 100644 +--- a/net/sunrpc/clnt.c ++++ b/net/sunrpc/clnt.c +@@ -37,6 +37,7 @@ + #include <linux/sunrpc/rpc_pipe_fs.h> + #include <linux/sunrpc/metrics.h> + #include <linux/sunrpc/bc_xprt.h> ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> + #include <trace/events/sunrpc.h> + + #include "sunrpc.h" +@@ -490,6 +491,8 @@ static struct rpc_clnt *rpc_create_xprt(struct rpc_create_args *args, + } + } + ++ rpc_multipath_ops_create_clnt(args, clnt); ++ + clnt->cl_softrtry = 1; + if (args->flags & RPC_CLNT_CREATE_HARDRTRY) + clnt->cl_softrtry = 0; +@@ -869,6 +872,8 @@ void rpc_shutdown_client(struct rpc_clnt *clnt) + list_empty(&clnt->cl_tasks), 1*HZ); + } + ++ rpc_multipath_ops_releas_clnt(clnt); ++ + rpc_release_client(clnt); + } + EXPORT_SYMBOL_GPL(rpc_shutdown_client); +@@ -981,7 +986,13 @@ void rpc_task_release_transport(struct rpc_task *task) + + if (xprt) { + task->tk_xprt = NULL; +- xprt_put(xprt); ++#if IS_ENABLED(CONFIG_ENFS) ++ if (task->tk_client) { ++ rpc_task_release_xprt(task->tk_client, xprt); ++ return; ++ } ++#endif ++ xprt_put(xprt); + } + } + EXPORT_SYMBOL_GPL(rpc_task_release_transport); +@@ -990,6 +1001,10 @@ void rpc_task_release_client(struct rpc_task *task) + { + struct rpc_clnt *clnt = task->tk_client; + ++#if IS_ENABLED(CONFIG_ENFS) ++ rpc_task_release_transport(task); ++#endif ++ + if (clnt != NULL) { + /* Remove from client task list */ + spin_lock(&clnt->cl_lock); +@@ -999,14 +1014,29 @@ void rpc_task_release_client(struct rpc_task *task) + + rpc_release_client(clnt); + } ++#if IS_ENABLED(CONFIG_ENFS) ++#else + rpc_task_release_transport(task); ++#endif + } + ++#if IS_ENABLED(CONFIG_ENFS) ++static struct rpc_xprt * ++rpc_task_get_next_xprt(struct rpc_clnt *clnt) ++{ ++ return rpc_task_get_xprt(clnt, xprt_iter_get_next(&clnt->cl_xpi)); ++} ++#endif ++ + static + void rpc_task_set_transport(struct rpc_task *task, struct rpc_clnt *clnt) + { + if (!task->tk_xprt) ++#if IS_ENABLED(CONFIG_ENFS) ++ task->tk_xprt = rpc_task_get_next_xprt(clnt); ++#else + task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi); ++#endif + } + + static +@@ -1597,6 +1627,14 @@ call_reserveresult(struct rpc_task *task) + return; + case -EIO: /* probably a shutdown */ + break; ++#if IS_ENABLED(CONFIG_ENFS) ++ case -ETIMEDOUT: /* woken up; restart */ ++ if (rpc_multipath_ops_task_need_call_start_again(task)) { ++ rpc_task_release_transport(task); ++ task->tk_action = call_start; ++ return; ++ } ++#endif + default: + printk(KERN_ERR "%s: unrecognized error %d, exiting\n", + __func__, status); +@@ -1962,6 +2000,10 @@ call_transmit(struct rpc_task *task) + return; + if (!xprt_prepare_transmit(task)) + return; ++ ++ if (rpc_multipath_ops_prepare_transmit(task)) ++ return; ++ + task->tk_action = call_transmit_status; + /* Encode here so that rpcsec_gss can use correct sequence number. */ + if (rpc_task_need_encode(task)) { +@@ -2277,6 +2319,9 @@ call_timeout(struct rpc_task *task) + + retry: + task->tk_action = call_bind; ++#if IS_ENABLED(CONFIG_ENFS) ++ rpc_multipath_ops_failover_handle(task); ++#endif + task->tk_status = 0; + } + +@@ -2961,3 +3006,30 @@ rpc_clnt_swap_deactivate(struct rpc_clnt *clnt) + } + EXPORT_SYMBOL_GPL(rpc_clnt_swap_deactivate); + #endif /* CONFIG_SUNRPC_SWAP */ ++ ++#if IS_ENABLED(CONFIG_ENFS) ++/* rpc_clnt_test_xprt - Test and add a new transport to a rpc_clnt ++ * @clnt: pointer to struct rpc_clnt ++ * @xprt: pointer struct rpc_xprt ++ * @ops: async operation ++ */ ++int ++rpc_clnt_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt, ++ const struct rpc_call_ops *ops, void *data, int flags) ++{ ++ struct rpc_cred *cred; ++ struct rpc_task *task; ++ ++ cred = authnull_ops.lookup_cred(NULL, NULL, 0); ++ task = rpc_call_null_helper(clnt, xprt, cred, ++ RPC_TASK_SOFT | RPC_TASK_SOFTCONN | flags, ++ ops, data); ++ put_rpccred(cred); ++ if (IS_ERR(task)) ++ return PTR_ERR(task); ++ ++ rpc_put_task(task); ++ return 1; ++} ++EXPORT_SYMBOL_GPL(rpc_clnt_test_xprt); ++#endif +diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c +index a873c92a4898..2254fea0e863 100644 +--- a/net/sunrpc/sched.c ++++ b/net/sunrpc/sched.c +@@ -20,7 +20,7 @@ + #include <linux/mutex.h> + #include <linux/freezer.h> + +-#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> + + #include "sunrpc.h" + +@@ -962,7 +962,12 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta + /* Initialize workqueue for async tasks */ + task->tk_workqueue = task_setup_data->workqueue; + ++#if IS_ENABLED(CONFIG_ENFS) ++ task->tk_xprt = rpc_task_get_xprt(task_setup_data->rpc_client, ++ xprt_get(task_setup_data->rpc_xprt)); ++#else + task->tk_xprt = xprt_get(task_setup_data->rpc_xprt); ++#endif + + if (task->tk_ops->rpc_call_prepare != NULL) + task->tk_action = rpc_prepare_task; +diff --git a/net/sunrpc/sunrpc_enfs_adapter.c b/net/sunrpc/sunrpc_enfs_adapter.c +new file mode 100644 +index 000000000000..c1543545c6de +--- /dev/null ++++ b/net/sunrpc/sunrpc_enfs_adapter.c +@@ -0,0 +1,214 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* Client-side SUNRPC ENFS adapter header. ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> ++ ++struct rpc_multipath_ops __rcu *multipath_ops; ++ ++void rpc_init_task_retry_counters(struct rpc_task *task) ++{ ++ /* Initialize retry counters */ ++ task->tk_garb_retry = 2; ++ task->tk_cred_retry = 2; ++ task->tk_rebind_retry = 2; ++} ++EXPORT_SYMBOL_GPL(rpc_init_task_retry_counters); ++ ++struct rpc_xprt * ++rpc_task_get_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) ++{ ++ struct rpc_xprt_switch *xps; ++ ++ if (!xprt) ++ return NULL; ++ rcu_read_lock(); ++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); ++ atomic_long_inc(&xps->xps_queuelen); ++ rcu_read_unlock(); ++ atomic_long_inc(&xprt->queuelen); ++ ++ return xprt; ++} ++ ++int rpc_multipath_ops_register(struct rpc_multipath_ops *ops) ++{ ++ struct rpc_multipath_ops *old; ++ ++ old = cmpxchg((struct rpc_multipath_ops **)&multipath_ops, NULL, ops); ++ if (!old || old == ops) ++ return 0; ++ pr_err("regist rpc_multipath ops %p fail. old %p\n", ops, old); ++ return -EPERM; ++} ++EXPORT_SYMBOL_GPL(rpc_multipath_ops_register); ++ ++int rpc_multipath_ops_unregister(struct rpc_multipath_ops *ops) ++{ ++ struct rpc_multipath_ops *old; ++ ++ old = cmpxchg((struct rpc_multipath_ops **)&multipath_ops, ops, NULL); ++ if (!old || old == ops) ++ return 0; ++ pr_err("regist rpc_multipath ops %p fail. old %p\n", ops, old); ++ return -EPERM; ++} ++EXPORT_SYMBOL_GPL(rpc_multipath_ops_unregister); ++ ++struct rpc_multipath_ops *rpc_multipath_ops_get(void) ++{ ++ struct rpc_multipath_ops *ops; ++ ++ rcu_read_lock(); ++ ops = rcu_dereference(multipath_ops); ++ if (!ops) { ++ rcu_read_unlock(); ++ return NULL; ++ } ++ if (!try_module_get(ops->owner)) ++ ops = NULL; ++ rcu_read_unlock(); ++ return ops; ++} ++EXPORT_SYMBOL_GPL(rpc_multipath_ops_get); ++ ++void rpc_multipath_ops_put(struct rpc_multipath_ops *ops) ++{ ++ if (ops) ++ module_put(ops->owner); ++} ++EXPORT_SYMBOL_GPL(rpc_multipath_ops_put); ++ ++void rpc_task_release_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) ++{ ++ struct rpc_xprt_switch *xps; ++ ++ atomic_long_dec(&xprt->queuelen); ++ rcu_read_lock(); ++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); ++ atomic_long_dec(&xps->xps_queuelen); ++ rcu_read_unlock(); ++ ++ xprt_put(xprt); ++} ++ ++void rpc_multipath_ops_create_clnt(struct rpc_create_args *args, ++ struct rpc_clnt *clnt) ++{ ++ struct rpc_multipath_ops *mops; ++ ++ if (args->multipath_option) { ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->create_clnt) ++ mops->create_clnt(args, clnt); ++ rpc_multipath_ops_put(mops); ++ } ++} ++ ++void rpc_multipath_ops_releas_clnt(struct rpc_clnt *clnt) ++{ ++ struct rpc_multipath_ops *mops; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->releas_clnt) ++ mops->releas_clnt(clnt); ++ ++ rpc_multipath_ops_put(mops); ++} ++ ++bool rpc_multipath_ops_create_xprt(struct rpc_xprt *xprt) ++{ ++ struct rpc_multipath_ops *mops = NULL; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->create_xprt) { ++ mops->create_xprt(xprt); ++ if (!xprt->multipath_context) { ++ rpc_multipath_ops_put(mops); ++ return true; ++ } ++ } ++ rpc_multipath_ops_put(mops); ++ return false; ++} ++ ++void rpc_multipath_ops_destroy_xprt(struct rpc_xprt *xprt) ++{ ++ struct rpc_multipath_ops *mops; ++ ++ if (xprt->multipath_context) { ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->destroy_xprt) ++ mops->destroy_xprt(xprt); ++ rpc_multipath_ops_put(mops); ++ } ++} ++ ++void rpc_multipath_ops_xprt_iostat(struct rpc_task *task) ++{ ++ struct rpc_multipath_ops *mops; ++ ++ mops = rpc_multipath_ops_get(); ++ if (task->tk_client && mops && mops->xprt_iostat) ++ mops->xprt_iostat(task); ++ rpc_multipath_ops_put(mops); ++} ++ ++void rpc_multipath_ops_failover_handle(struct rpc_task *task) ++{ ++ struct rpc_multipath_ops *mpath_ops = NULL; ++ ++ mpath_ops = rpc_multipath_ops_get(); ++ if (mpath_ops && mpath_ops->failover_handle) ++ mpath_ops->failover_handle(task); ++ rpc_multipath_ops_put(mpath_ops); ++} ++ ++bool rpc_multipath_ops_task_need_call_start_again(struct rpc_task *task) ++{ ++ struct rpc_multipath_ops *mpath_ops = NULL; ++ bool ret = false; ++ ++ mpath_ops = rpc_multipath_ops_get(); ++ if (mpath_ops && mpath_ops->task_need_call_start_again) ++ ret = mpath_ops->task_need_call_start_again(task); ++ rpc_multipath_ops_put(mpath_ops); ++ return ret; ++} ++ ++void rpc_multipath_ops_adjust_task_timeout(struct rpc_task *task, ++ void *condition) ++{ ++ struct rpc_multipath_ops *mops = NULL; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->adjust_task_timeout) ++ mops->adjust_task_timeout(task, NULL); ++ rpc_multipath_ops_put(mops); ++} ++ ++void rpc_multipath_ops_init_task_req(struct rpc_task *task, ++ struct rpc_rqst *req) ++{ ++ struct rpc_multipath_ops *mops = NULL; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->init_task_req) ++ mops->init_task_req(task, req); ++ rpc_multipath_ops_put(mops); ++} ++ ++bool rpc_multipath_ops_prepare_transmit(struct rpc_task *task) ++{ ++ struct rpc_multipath_ops *mops = NULL; ++ ++ mops = rpc_multipath_ops_get(); ++ if (mops && mops->prepare_transmit) { ++ if (!(mops->prepare_transmit(task))) { ++ rpc_multipath_ops_put(mops); ++ return true; ++ } ++ } ++ rpc_multipath_ops_put(mops); ++ return false; ++} +diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c +index c912bf20faa2..c2b63b3d5217 100644 +--- a/net/sunrpc/xprt.c ++++ b/net/sunrpc/xprt.c +@@ -48,6 +48,7 @@ + #include <linux/sunrpc/clnt.h> + #include <linux/sunrpc/metrics.h> + #include <linux/sunrpc/bc_xprt.h> ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> + #include <linux/rcupdate.h> + + #include <trace/events/sunrpc.h> +@@ -259,6 +260,9 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task) + dprintk("RPC: %5u failed to lock transport %p\n", + task->tk_pid, xprt); + task->tk_timeout = 0; ++ ++ rpc_multipath_ops_adjust_task_timeout(task, NULL); ++ + task->tk_status = -EAGAIN; + if (req == NULL) + priority = RPC_PRIORITY_LOW; +@@ -560,6 +564,9 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action) + struct rpc_xprt *xprt = req->rq_xprt; + + task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; ++ ++ rpc_multipath_ops_adjust_task_timeout(task, NULL); ++ + rpc_sleep_on(&xprt->pending, task, action); + } + EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); +@@ -1347,6 +1354,9 @@ xprt_request_init(struct rpc_task *task) + req->rq_rcv_buf.buflen = 0; + req->rq_release_snd_buf = NULL; + xprt_reset_majortimeo(req); ++ ++ rpc_multipath_ops_init_task_req(task, req); ++ + dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid, + req, ntohl(req->rq_xid)); + } +@@ -1427,6 +1437,9 @@ void xprt_release(struct rpc_task *task) + task->tk_ops->rpc_count_stats(task, task->tk_calldata); + else if (task->tk_client) + rpc_count_iostats(task, task->tk_client->cl_metrics); ++ ++ rpc_multipath_ops_xprt_iostat(task); ++ + spin_lock(&xprt->recv_lock); + if (!list_empty(&req->rq_list)) { + list_del_init(&req->rq_list); +@@ -1455,6 +1468,7 @@ void xprt_release(struct rpc_task *task) + else + xprt_free_bc_request(req); + } ++EXPORT_SYMBOL_GPL(xprt_release); + + static void xprt_init(struct rpc_xprt *xprt, struct net *net) + { +@@ -1528,6 +1542,10 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args) + return ERR_PTR(-ENOMEM); + } + ++if (rpc_multipath_ops_create_xprt(xprt)) { ++ xprt_destroy(xprt); ++ return ERR_PTR(-ENOMEM); ++} + rpc_xprt_debugfs_register(xprt); + + dprintk("RPC: created transport %p with %u slots\n", xprt, +@@ -1547,6 +1565,9 @@ static void xprt_destroy_cb(struct work_struct *work) + rpc_destroy_wait_queue(&xprt->sending); + rpc_destroy_wait_queue(&xprt->backlog); + kfree(xprt->servername); ++ ++ rpc_multipath_ops_destroy_xprt(xprt); ++ + /* + * Tear down transport state and free the rpc_xprt + */ +diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c +index 6ebaa58b4eff..6202a0be1327 100644 +--- a/net/sunrpc/xprtmultipath.c ++++ b/net/sunrpc/xprtmultipath.c +@@ -18,6 +18,7 @@ + #include <linux/sunrpc/xprt.h> + #include <linux/sunrpc/addr.h> + #include <linux/sunrpc/xprtmultipath.h> ++#include <linux/sunrpc/sunrpc_enfs_adapter.h> + + typedef struct rpc_xprt *(*xprt_switch_find_xprt_t)(struct list_head *head, + const struct rpc_xprt *cur); +@@ -26,8 +27,8 @@ static const struct rpc_xprt_iter_ops rpc_xprt_iter_singular; + static const struct rpc_xprt_iter_ops rpc_xprt_iter_roundrobin; + static const struct rpc_xprt_iter_ops rpc_xprt_iter_listall; + +-static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, +- struct rpc_xprt *xprt) ++void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, ++ struct rpc_xprt *xprt) + { + if (unlikely(xprt_get(xprt) == NULL)) + return; +@@ -36,7 +37,9 @@ static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, + if (xps->xps_nxprts == 0) + xps->xps_net = xprt->xprt_net; + xps->xps_nxprts++; ++ rpc_xps_nactive_add_one(xps); + } ++EXPORT_SYMBOL(xprt_switch_add_xprt_locked); + + /** + * rpc_xprt_switch_add_xprt - Add a new rpc_xprt to an rpc_xprt_switch +@@ -63,6 +66,7 @@ static void xprt_switch_remove_xprt_locked(struct rpc_xprt_switch *xps, + if (unlikely(xprt == NULL)) + return; + xps->xps_nxprts--; ++ rpc_xps_nactive_sub_one(xps); + if (xps->xps_nxprts == 0) + xps->xps_net = NULL; + smp_wmb(); +@@ -84,7 +88,7 @@ void rpc_xprt_switch_remove_xprt(struct rpc_xprt_switch *xps, + spin_unlock(&xps->xps_lock); + xprt_put(xprt); + } +- ++EXPORT_SYMBOL(rpc_xprt_switch_remove_xprt); + /** + * xprt_switch_alloc - Allocate a new struct rpc_xprt_switch + * @xprt: pointer to struct rpc_xprt +@@ -102,7 +106,13 @@ struct rpc_xprt_switch *xprt_switch_alloc(struct rpc_xprt *xprt, + if (xps != NULL) { + spin_lock_init(&xps->xps_lock); + kref_init(&xps->xps_kref); ++#if IS_ENABLED(CONFIG_ENFS) ++ xps->xps_nxprts = 0; ++ xps->xps_nactive = 0; ++ atomic_long_set(&xps->xps_queuelen, 0); ++#else + xps->xps_nxprts = 0; ++#endif + INIT_LIST_HEAD(&xps->xps_xprt_list); + xps->xps_iter_ops = &rpc_xprt_iter_singular; + xprt_switch_add_xprt_locked(xps, xprt); +@@ -148,6 +158,7 @@ struct rpc_xprt_switch *xprt_switch_get(struct rpc_xprt_switch *xps) + return xps; + return NULL; + } ++EXPORT_SYMBOL(xprt_switch_get); + + /** + * xprt_switch_put - Release a reference to a rpc_xprt_switch +@@ -160,6 +171,7 @@ void xprt_switch_put(struct rpc_xprt_switch *xps) + if (xps != NULL) + kref_put(&xps->xps_kref, xprt_switch_free); + } ++EXPORT_SYMBOL(xprt_switch_put); + + /** + * rpc_xprt_switch_set_roundrobin - Set a round-robin policy on rpc_xprt_switch diff --git a/0003-add_enfs_module_for_nfs_mount_option.patch b/0003-add_enfs_module_for_nfs_mount_option.patch new file mode 100644 index 0000000..70753b5 --- /dev/null +++ b/0003-add_enfs_module_for_nfs_mount_option.patch @@ -0,0 +1,1209 @@ +diff --git a/fs/nfs/enfs/Makefile b/fs/nfs/enfs/Makefile +new file mode 100644 +index 000000000000..6e83eb23c668 +--- /dev/null ++++ b/fs/nfs/enfs/Makefile +@@ -0,0 +1,18 @@ ++obj-m += enfs.o ++ ++#EXTRA_CFLAGS += -I$(PWD)/.. ++ ++enfs-y := enfs_init.o ++enfs-y += enfs_config.o ++enfs-y += mgmt_init.o ++enfs-y += enfs_multipath_client.o ++enfs-y += enfs_multipath_parse.o ++enfs-y += failover_path.o ++enfs-y += failover_time.o ++enfs-y += enfs_roundrobin.o ++enfs-y += enfs_multipath.o ++enfs-y += enfs_path.o ++enfs-y += enfs_proc.o ++enfs-y += enfs_remount.o ++enfs-y += pm_ping.o ++enfs-y += pm_state.o +diff --git a/fs/nfs/enfs/enfs.h b/fs/nfs/enfs/enfs.h +new file mode 100644 +index 000000000000..be3d95220088 +--- /dev/null ++++ b/fs/nfs/enfs/enfs.h +@@ -0,0 +1,62 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS multipath adapt header. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++ ++#ifndef _ENFS_H_ ++#define _ENFS_H_ ++#include <linux/atomic.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs3.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include "../enfs_adapter.h" ++ ++#define IP_ADDRESS_LEN_MAX 64 ++#define MAX_IP_PAIR_PER_MOUNT 8 ++#define MAX_IP_INDEX (MAX_IP_PAIR_PER_MOUNT) ++#define MAX_SUPPORTED_LOCAL_IP_COUNT 8 ++#define MAX_SUPPORTED_REMOTE_IP_COUNT 32 ++#define MAX_DNS_NAME_LEN 512 ++#define MAX_DNS_SUPPORTED 2 ++#define EXTEND_CMD_MAX_BUF_LEN 65356 ++ ++ ++struct nfs_ip_list { ++ int count; ++ struct sockaddr_storage address[MAX_SUPPORTED_REMOTE_IP_COUNT]; ++ size_t addrlen[MAX_SUPPORTED_REMOTE_IP_COUNT]; ++}; ++ ++struct NFS_ROUTE_DNS_S { ++ char dnsname[MAX_DNS_NAME_LEN]; // valid only if dnsExist is true ++}; ++ ++struct NFS_ROUTE_DNS_INFO_S { ++ int dnsNameCount; // Count of DNS name in the list ++ // valid only if dnsExist is true ++ struct NFS_ROUTE_DNS_S routeRemoteDnsList[MAX_DNS_SUPPORTED]; ++}; ++ ++struct rpc_iostats; ++struct enfs_xprt_context { ++ struct sockaddr_storage srcaddr; ++ struct rpc_iostats *stats; ++ bool main; ++ atomic_t path_state; ++ atomic_t path_check_state; ++}; ++ ++static inline bool enfs_is_main_xprt(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx = xprt->multipath_context; ++ ++ if (!ctx) ++ return false; ++ return ctx->main; ++} ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_init.c b/fs/nfs/enfs/enfs_init.c +new file mode 100644 +index 000000000000..4b55608191a7 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_init.c +@@ -0,0 +1,98 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/module.h> ++#include <linux/sunrpc/sched.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs3.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include "enfs.h" ++#include "enfs_multipath_parse.h" ++#include "enfs_multipath_client.h" ++#include "enfs_remount.h" ++#include "init.h" ++#include "enfs_log.h" ++#include "enfs_multipath.h" ++#include "mgmt_init.h" ++ ++struct enfs_adapter_ops enfs_adapter = { ++ .name = "enfs", ++ .owner = THIS_MODULE, ++ .parse_mount_options = nfs_multipath_parse_options, ++ .free_mount_options = nfs_multipath_free_options, ++ .client_info_init = nfs_multipath_client_info_init, ++ .client_info_free = nfs_multipath_client_info_free, ++ .client_info_match = nfs_multipath_client_info_match, ++ .client_info_show = nfs_multipath_client_info_show, ++ .remount_ip_list = enfs_remount_iplist, ++}; ++ ++int32_t enfs_init(void) ++{ ++ int err; ++ ++ err = enfs_multipath_init(); ++ if (err) { ++ enfs_log_error("init multipath failed.\n"); ++ goto out; ++ } ++ ++ err = mgmt_init(); ++ if (err != 0) { ++ enfs_log_error("init mgmt failed.\n"); ++ goto out_tp_exit; ++ } ++ ++ return 0; ++ ++out_tp_exit: ++ enfs_multipath_exit(); ++out: ++ return err; ++} ++ ++void enfs_fini(void) ++{ ++ mgmt_fini(); ++ ++ enfs_multipath_exit(); ++} ++ ++static int __init init_enfs(void) ++{ ++ int ret; ++ ++ ret = enfs_adapter_register(&enfs_adapter); ++ if (ret) { ++ pr_err("regist enfs_adapter fail. ret %d\n", ret); ++ return -1; ++ } ++ ++ ret = enfs_init(); ++ if (ret) { ++ enfs_adapter_unregister(&enfs_adapter); ++ return -1; ++ } ++ ++ return 0; ++} ++ ++static void __exit exit_enfs(void) ++{ ++ enfs_fini(); ++ enfs_adapter_unregister(&enfs_adapter); ++} ++ ++MODULE_LICENSE("GPL"); ++MODULE_AUTHOR("Huawei Tech. Co., Ltd."); ++MODULE_DESCRIPTION("Nfs client router"); ++MODULE_VERSION("1.0"); ++ ++module_init(init_enfs); ++module_exit(exit_enfs); +diff --git a/fs/nfs/enfs/enfs_multipath_client.c b/fs/nfs/enfs/enfs_multipath_client.c +new file mode 100644 +index 000000000000..63c02898a42c +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_client.c +@@ -0,0 +1,340 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/types.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include <linux/proc_fs.h> ++#include <linux/seq_file.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/addr.h> ++#include "enfs_multipath_client.h" ++#include "enfs_multipath_parse.h" ++ ++int ++nfs_multipath_client_mount_info_init(struct multipath_client_info *client_info, ++ const struct nfs_client_initdata *client_init_data) ++{ ++ struct multipath_mount_options *mount_options = ++ (struct multipath_mount_options *)client_init_data->enfs_option; ++ ++ if (mount_options->local_ip_list) { ++ client_info->local_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ ++ if (!client_info->local_ip_list) ++ return -ENOMEM; ++ ++ memcpy(client_info->local_ip_list, mount_options->local_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (mount_options->remote_ip_list) { ++ ++ client_info->remote_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ ++ if (!client_info->remote_ip_list) { ++ kfree(client_info->local_ip_list); ++ client_info->local_ip_list = NULL; ++ return -ENOMEM; ++ } ++ memcpy(client_info->remote_ip_list, ++ mount_options->remote_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (mount_options->pRemoteDnsInfo) { ++ client_info->pRemoteDnsInfo = ++ kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), GFP_KERNEL); ++ ++ if (!client_info->pRemoteDnsInfo) { ++ kfree(client_info->local_ip_list); ++ client_info->local_ip_list = NULL; ++ kfree(client_info->remote_ip_list); ++ client_info->remote_ip_list = NULL; ++ return -ENOMEM; ++ } ++ memcpy(client_info->pRemoteDnsInfo, ++ mount_options->pRemoteDnsInfo, ++ sizeof(struct NFS_ROUTE_DNS_INFO_S)); ++ } ++ return 0; ++} ++ ++void nfs_multipath_client_info_free_work(struct work_struct *work) ++{ ++ ++ struct multipath_client_info *clp_info; ++ ++ if (work == NULL) ++ return; ++ ++ clp_info = container_of(work, struct multipath_client_info, work); ++ ++ if (clp_info->local_ip_list != NULL) { ++ kfree(clp_info->local_ip_list); ++ clp_info->local_ip_list = NULL; ++ } ++ if (clp_info->remote_ip_list != NULL) { ++ kfree(clp_info->remote_ip_list); ++ clp_info->remote_ip_list = NULL; ++ } ++ kfree(clp_info); ++} ++ ++void nfs_multipath_client_info_free(void *data) ++{ ++ struct multipath_client_info *clp_info = ++ (struct multipath_client_info *)data; ++ ++ if (clp_info == NULL) ++ return; ++ pr_info("free client info %p.\n", clp_info); ++ INIT_WORK(&clp_info->work, nfs_multipath_client_info_free_work); ++ schedule_work(&clp_info->work); ++} ++ ++int nfs_multipath_client_info_init(void **data, ++ const struct nfs_client_initdata *cl_init) ++{ ++ int rc; ++ struct multipath_client_info *info; ++ struct multipath_client_info **enfs_info; ++ /* no multi path info, no need do multipath init */ ++ if (cl_init->enfs_option == NULL) ++ return 0; ++ enfs_info = (struct multipath_client_info **)data; ++ if (enfs_info == NULL) ++ return -EINVAL; ++ ++ if (*enfs_info == NULL) ++ *enfs_info = kzalloc(sizeof(struct multipath_client_info), ++ GFP_KERNEL); ++ ++ if (*enfs_info == NULL) ++ return -ENOMEM; ++ ++ info = (struct multipath_client_info *)*enfs_info; ++ pr_info("init client info %p.\n", info); ++ rc = nfs_multipath_client_mount_info_init(info, cl_init); ++ if (rc) { ++ nfs_multipath_client_info_free((void *)info); ++ return rc; ++ } ++ return rc; ++} ++ ++bool nfs_multipath_ip_list_info_match(const struct nfs_ip_list *ip_list_src, ++ const struct nfs_ip_list *ip_list_dst) ++{ ++ int i; ++ int j; ++ bool is_find; ++ /* if both are equal or NULL, then return true. */ ++ if (ip_list_src == ip_list_dst) ++ return true; ++ ++ if ((ip_list_src == NULL || ip_list_dst == NULL)) ++ return false; ++ ++ if (ip_list_src->count != ip_list_dst->count) ++ return false; ++ ++ for (i = 0; i < ip_list_src->count; i++) { ++ is_find = false; ++ for (j = 0; j < ip_list_src->count; j++) { ++ if (rpc_cmp_addr_port( ++ (const struct sockaddr *) ++ &ip_list_src->address[i], ++ (const struct sockaddr *) ++ &ip_list_dst->address[j]) ++ ) { ++ is_find = true; ++ break; ++ } ++ } ++ if (is_find == false) ++ return false; ++ } ++ return true; ++} ++ ++int ++nfs_multipath_dns_list_info_match( ++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoSrc, ++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoDst) ++{ ++ int i; ++ ++ /* if both are equal or NULL, then return true. */ ++ if (pRemoteDnsInfoSrc == pRemoteDnsInfoDst) ++ return true; ++ ++ if ((pRemoteDnsInfoSrc == NULL || pRemoteDnsInfoDst == NULL)) ++ return false; ++ ++ if (pRemoteDnsInfoSrc->dnsNameCount != pRemoteDnsInfoDst->dnsNameCount) ++ return false; ++ ++ for (i = 0; i < pRemoteDnsInfoSrc->dnsNameCount; i++) { ++ if (!strcmp(pRemoteDnsInfoSrc->routeRemoteDnsList[i].dnsname, ++ pRemoteDnsInfoDst->routeRemoteDnsList[i].dnsname)) ++ return false; ++ } ++ return true; ++} ++ ++int nfs_multipath_client_info_match(void *src, void *dst) ++{ ++ int ret = true; ++ ++ struct multipath_client_info *src_info; ++ struct multipath_mount_options *dst_info; ++ ++ src_info = (struct multipath_client_info *)src; ++ dst_info = (struct multipath_mount_options *)dst; ++ pr_info("try match client .\n"); ++ ret = nfs_multipath_ip_list_info_match(src_info->local_ip_list, ++ dst_info->local_ip_list); ++ if (ret == false) { ++ pr_err("local_ip not match.\n"); ++ return ret; ++ } ++ ++ ret = nfs_multipath_ip_list_info_match(src_info->remote_ip_list, ++ dst_info->remote_ip_list); ++ if (ret == false) { ++ pr_err("remote_ip not match.\n"); ++ return ret; ++ } ++ ++ ret = nfs_multipath_dns_list_info_match(src_info->pRemoteDnsInfo, ++ dst_info->pRemoteDnsInfo); ++ if (ret == false) { ++ pr_err("dns not match.\n"); ++ return ret; ++ } ++ pr_info("try match client ret %d.\n", ret); ++ return ret; ++} ++ ++void nfs_multipath_print_ip_info(struct seq_file *mount_option, ++ struct nfs_ip_list *ip_list, ++ const char *type) ++{ ++ char buf[IP_ADDRESS_LEN_MAX + 1]; ++ int len = 0; ++ int i = 0; ++ ++ seq_printf(mount_option, ",%s=", type); ++ for (i = 0; i < ip_list->count; i++) { ++ len = rpc_ntop((struct sockaddr *)&ip_list->address[i], ++ buf, IP_ADDRESS_LEN_MAX); ++ if (len > 0 && len < IP_ADDRESS_LEN_MAX) ++ buf[len] = '\0'; ++ ++ if (i == 0) ++ seq_printf(mount_option, "%s", buf); ++ else ++ seq_printf(mount_option, "~%s", buf); ++ dfprintk(MOUNT, ++ "NFS: show nfs mount option type:%s %s [%s]\n", ++ type, buf, __func__); ++ } ++} ++ ++void nfs_multipath_print_dns_info(struct seq_file *mount_option, ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo, ++ const char *type) ++{ ++ int i = 0; ++ ++ seq_printf(mount_option, ",%s=", type); ++ for (i = 0; i < pRemoteDnsInfo->dnsNameCount; i++) { ++ if (i == 0) ++ seq_printf(mount_option, ++ "[%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ else if (i == pRemoteDnsInfo->dnsNameCount - 1) ++ seq_printf(mount_option, ",%s]", ++ pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ else ++ seq_printf(mount_option, ++ ",%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ } ++} ++ ++ ++static void multipath_print_sockaddr(struct seq_file *seq, ++ struct sockaddr *addr) ++{ ++ switch (addr->sa_family) { ++ case AF_INET: { ++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; ++ ++ seq_printf(seq, "%pI4", &sin->sin_addr); ++ return; ++ } ++ case AF_INET6: { ++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; ++ ++ seq_printf(seq, "%pI6", &sin6->sin6_addr); ++ return; ++ } ++ default: ++ break; ++ } ++ pr_err("unsupport family:%d\n", addr->sa_family); ++} ++ ++static void multipath_print_enfs_info(struct seq_file *seq, ++ struct nfs_server *server) ++{ ++ struct sockaddr_storage peeraddr; ++ struct rpc_clnt *next = server->client; ++ ++ rpc_peeraddr(server->client, ++ (struct sockaddr *)&peeraddr, sizeof(peeraddr)); ++ seq_puts(seq, ",enfs_info="); ++ multipath_print_sockaddr(seq, (struct sockaddr *)&peeraddr); ++ ++ while (next->cl_parent) { ++ if (next == next->cl_parent) ++ break; ++ next = next->cl_parent; ++ } ++ seq_printf(seq, "_%u", next->cl_clid); ++} ++ ++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data) ++{ ++ struct nfs_server *server = data; ++ struct multipath_client_info *client_info = ++ server->nfs_client->cl_multipath_data; ++ ++ dfprintk(MOUNT, "NFS: show nfs mount option[%s]\n", __func__); ++ if ((client_info->remote_ip_list) && ++ (client_info->remote_ip_list->count > 0)) ++ nfs_multipath_print_ip_info(mount_option, ++ client_info->remote_ip_list, ++ "remoteaddrs"); ++ ++ if ((client_info->local_ip_list) && ++ (client_info->local_ip_list->count > 0)) ++ nfs_multipath_print_ip_info(mount_option, ++ client_info->local_ip_list, ++ "localaddrs"); ++ ++ if ((client_info->pRemoteDnsInfo) && ++ (client_info->pRemoteDnsInfo->dnsNameCount > 0)) ++ nfs_multipath_print_dns_info(mount_option, ++ client_info->pRemoteDnsInfo, ++ "remotednsname"); ++ ++ multipath_print_enfs_info(mount_option, server); ++} +diff --git a/fs/nfs/enfs/enfs_multipath_client.h b/fs/nfs/enfs/enfs_multipath_client.h +new file mode 100644 +index 000000000000..208f7260690d +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_client.h +@@ -0,0 +1,26 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _ENFS_MULTIPATH_CLIENT_H_ ++#define _ENFS_MULTIPATH_CLIENT_H_ ++ ++#include "enfs.h" ++ ++struct multipath_client_info { ++ struct work_struct work; ++ struct nfs_ip_list *remote_ip_list; ++ struct nfs_ip_list *local_ip_list; ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; ++ s64 client_id; ++}; ++ ++int nfs_multipath_client_info_init(void **data, ++ const struct nfs_client_initdata *cl_init); ++void nfs_multipath_client_info_free(void *data); ++int nfs_multipath_client_info_match(void *src, void *dst); ++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data); ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_multipath_parse.c b/fs/nfs/enfs/enfs_multipath_parse.c +new file mode 100644 +index 000000000000..9c4c6c1880b6 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_parse.c +@@ -0,0 +1,601 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/types.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include <linux/parser.h> ++#include <linux/kern_levels.h> ++#include <linux/sunrpc/addr.h> ++#include "enfs_multipath_parse.h" ++#include "enfs_log.h" ++ ++#define NFSDBG_FACILITY NFSDBG_CLIENT ++ ++void nfs_multipath_parse_ip_ipv6_add(struct sockaddr_in6 *sin6, int add_num) ++{ ++ int i; ++ ++ pr_info("NFS: before %08x%08x%08x%08x add_num: %d[%s]\n", ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]), ++ add_num, __func__); ++ for (i = 0; i < add_num; i++) { ++ sin6->sin6_addr.in6_u.u6_addr32[3] = ++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]) + 1); ++ ++ if (sin6->sin6_addr.in6_u.u6_addr32[3] != 0) ++ continue; ++ ++ sin6->sin6_addr.in6_u.u6_addr32[2] = ++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]) + 1); ++ ++ if (sin6->sin6_addr.in6_u.u6_addr32[2] != 0) ++ continue; ++ ++ sin6->sin6_addr.in6_u.u6_addr32[1] = ++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]) + 1); ++ ++ if (sin6->sin6_addr.in6_u.u6_addr32[1] != 0) ++ continue; ++ ++ sin6->sin6_addr.in6_u.u6_addr32[0] = ++ htonl(ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]) + 1); ++ ++ if (sin6->sin6_addr.in6_u.u6_addr32[0] != 0) ++ continue; ++ } ++ ++ return; ++ ++} ++ ++static int nfs_multipath_parse_ip_range(struct net *net_ns, const char *cursor, ++ struct nfs_ip_list *ip_list, enum nfsmultipathoptions type) ++{ ++ struct sockaddr_storage addr; ++ struct sockaddr_storage tmp_addr; ++ int i; ++ size_t len; ++ int add_num = 1; ++ bool duplicate_flag = false; ++ bool is_complete = false; ++ struct sockaddr_in *sin4; ++ struct sockaddr_in6 *sin6; ++ ++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", ++ cursor, type, __func__); ++ len = rpc_pton(net_ns, cursor, strlen(cursor), ++ (struct sockaddr *)&addr, sizeof(addr)); ++ if (!len) ++ return -EINVAL; ++ ++ if (addr.ss_family != ip_list->address[ip_list->count - 1].ss_family) { ++ pr_info("NFS: %s parsing nfs mount option type: %d fail.\n", ++ __func__, type); ++ return -EINVAL; ++ } ++ ++ if (rpc_cmp_addr((const struct sockaddr *) ++ &ip_list->address[ip_list->count - 1], ++ (const struct sockaddr *)&addr)) { ++ ++ pr_info("range ip is same ip.\n"); ++ return 0; ++ ++ } ++ ++ while (true) { ++ ++ tmp_addr = ip_list->address[ip_list->count - 1]; ++ ++ switch (addr.ss_family) { ++ case AF_INET: ++ sin4 = (struct sockaddr_in *)&tmp_addr; ++ ++ sin4->sin_addr.s_addr = ++ htonl(ntohl(sin4->sin_addr.s_addr) + add_num); ++ ++ pr_info("NFS: mount option ip%08x type: %d ipcont %d [%s]\n", ++ ntohl(sin4->sin_addr.s_addr), ++ type, ip_list->count, __func__); ++ break; ++ case AF_INET6: ++ sin6 = (struct sockaddr_in6 *)&tmp_addr; ++ nfs_multipath_parse_ip_ipv6_add(sin6, add_num); ++ pr_info("NFS: mount option ip %08x%08x%08x%08x type: %d ipcont %d [%s]\n", ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[0]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[1]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[2]), ++ ntohl(sin6->sin6_addr.in6_u.u6_addr32[3]), ++ type, ip_list->count, __func__); ++ break; ++ // return -EOPNOTSUPP; ++ default: ++ return -EOPNOTSUPP; ++ } ++ ++ if (rpc_cmp_addr((const struct sockaddr *)&tmp_addr, ++ (const struct sockaddr *)&addr)) { ++ is_complete = true; ++ } ++ // delete duplicate ip, continuosly repeat, skip it ++ for (i = 0; i < ip_list->count; i++) { ++ duplicate_flag = false; ++ if (rpc_cmp_addr((const struct sockaddr *) ++ &ip_list->address[i], ++ (const struct sockaddr *)&tmp_addr)) { ++ add_num++; ++ duplicate_flag = true; ++ break; ++ } ++ } ++ ++ if (duplicate_flag == false) { ++ pr_info("this ip not duplicate;"); ++ add_num = 1; ++ // if not repeat but omit limit return false ++ if ((type == LOCALADDR && ++ ip_list->count >= MAX_SUPPORTED_LOCAL_IP_COUNT) || ++ (type == REMOTEADDR && ++ ip_list->count >= MAX_SUPPORTED_REMOTE_IP_COUNT)) { ++ ++ pr_info("[MULTIPATH:%s] iplist for type %d reached %d, more than supported limit %d\n", ++ __func__, type, ip_list->count, ++ type == LOCALADDR ? ++ MAX_SUPPORTED_LOCAL_IP_COUNT : ++ MAX_SUPPORTED_REMOTE_IP_COUNT); ++ ip_list->count = 0; ++ return -ENOSPC; ++ } ++ ip_list->address[ip_list->count] = tmp_addr; ++ ++ ip_list->addrlen[ip_list->count] = ++ ip_list->addrlen[ip_list->count - 1]; ++ ++ ip_list->count += 1; ++ } ++ if (is_complete == true) ++ break; ++ } ++ return 0; ++} ++ ++int nfs_multipath_parse_ip_list_inter(struct nfs_ip_list *ip_list, ++ struct net *net_ns, ++ char *cursor, enum nfsmultipathoptions type) ++{ ++ int i = 0; ++ struct sockaddr_storage addr; ++ struct sockaddr_storage swap; ++ int len; ++ ++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", ++ cursor, type, __func__); ++ ++ len = rpc_pton(net_ns, cursor, ++ strlen(cursor), ++ (struct sockaddr *)&addr, sizeof(addr)); ++ if (!len) ++ return -EINVAL; ++ ++ // check repeated ip ++ for (i = 0; i < ip_list->count; i++) { ++ if (rpc_cmp_addr((const struct sockaddr *) ++ &ip_list->address[i], ++ (const struct sockaddr *)&addr)) { ++ ++ pr_info("NFS: mount option '%s' type:%d index %d same as before index %d [%s]\n", ++ cursor, type, ip_list->count, i, __func__); ++ // prevent this ip is beginning ++ // if repeated take it to the end of list ++ swap = ip_list->address[i]; ++ ++ ip_list->address[i] = ++ ip_list->address[ip_list->count-1]; ++ ++ ip_list->address[ip_list->count-1] = swap; ++ return 0; ++ } ++ } ++ // if not repeated, check exceed limit ++ if ((type == LOCALADDR && ++ ip_list->count >= MAX_SUPPORTED_LOCAL_IP_COUNT) || ++ (type == REMOTEADDR && ++ ip_list->count >= MAX_SUPPORTED_REMOTE_IP_COUNT)) { ++ ++ pr_info("[MULTIPATH:%s] iplist for type %d reached %d, more than supported limit %d\n", ++ __func__, type, ip_list->count, ++ type == LOCALADDR ? ++ MAX_SUPPORTED_LOCAL_IP_COUNT : ++ MAX_SUPPORTED_REMOTE_IP_COUNT); ++ ++ ip_list->count = 0; ++ return -ENOSPC; ++ } ++ ip_list->address[ip_list->count] = addr; ++ ip_list->addrlen[ip_list->count] = len; ++ ip_list->count++; ++ ++ return 0; ++} ++ ++char *nfs_multipath_parse_ip_list_get_cursor(char **buf_to_parse, bool *single) ++{ ++ char *cursor = NULL; ++ const char *single_sep = strchr(*buf_to_parse, '~'); ++ const char *range_sep = strchr(*buf_to_parse, '-'); ++ ++ *single = true; ++ if (range_sep) { ++ if (range_sep > single_sep) { // A-B or A~B-C ++ if (single_sep == NULL) { // A-B ++ cursor = strsep(buf_to_parse, "-"); ++ if (cursor) ++ *single = false; ++ } else// A~B-C ++ cursor = strsep(buf_to_parse, "~"); ++ } else { // A-B~C ++ cursor = strsep(buf_to_parse, "-"); ++ if (cursor) ++ *single = false; ++ } ++ } else { // A~B~C ++ cursor = strsep(buf_to_parse, "~"); ++ } ++ return cursor; ++} ++ ++bool nfs_multipath_parse_param_check(enum nfsmultipathoptions type, ++ struct multipath_mount_options *options) ++{ ++ if (type == REMOUNTREMOTEADDR && options->remote_ip_list->count != 0) { ++ memset(options->remote_ip_list, 0, sizeof(struct nfs_ip_list)); ++ return true; ++ } ++ if (type == REMOUNTLOCALADDR && options->local_ip_list->count != 0) { ++ memset(options->local_ip_list, 0, sizeof(struct nfs_ip_list)); ++ return true; ++ } ++ if ((type == REMOTEADDR || type == REMOTEDNSNAME) && ++ options->pRemoteDnsInfo->dnsNameCount != 0) { ++ ++ pr_info("[MULTIPATH:%s] parse for %d ,already have dns\n", ++ __func__, type); ++ return false; ++ } else if ((type == REMOTEADDR || type == REMOTEDNSNAME) && ++ options->remote_ip_list->count != 0) { ++ ++ pr_info("[MULTIPATH:%s] parse for %d ,already have iplist\n", ++ __func__, type); ++ return false; ++ } ++ return true; ++} ++ ++int nfs_multipath_parse_ip_list(char *buffer, struct net *net_ns, ++ struct multipath_mount_options *options, ++ enum nfsmultipathoptions type) ++{ ++ char *buf_to_parse = NULL; ++ bool prev_range = false; ++ int ret = 0; ++ char *cursor = NULL; ++ bool single = true; ++ struct nfs_ip_list *ip_list_tmp = NULL; ++ ++ if (!nfs_multipath_parse_param_check(type, options)) ++ return -ENOTSUPP; ++ ++ if (type == REMOUNTREMOTEADDR) ++ type = REMOTEADDR; ++ ++ if (type == REMOUNTLOCALADDR) ++ type = LOCALADDR; ++ ++ if (type == LOCALADDR) ++ ip_list_tmp = options->local_ip_list; ++ else ++ ip_list_tmp = options->remote_ip_list; ++ ++ pr_info("NFS: parsing nfs mount option '%s' type: %d[%s]\n", ++ buffer, type, __func__); ++ ++ buf_to_parse = buffer; ++ while (buf_to_parse != NULL) { ++ cursor = ++ nfs_multipath_parse_ip_list_get_cursor(&buf_to_parse, &single); ++ if (!cursor) ++ break; ++ ++ if (single == false && prev_range == true) { ++ pr_info("NFS: mount option type: %d fail. Multiple Range.[%s]\n", ++ type, __func__); ++ ++ ret = -EINVAL; ++ goto out; ++ } ++ ++ if (prev_range == false) { ++ ret = nfs_multipath_parse_ip_list_inter(ip_list_tmp, ++ net_ns, cursor, type); ++ if (ret) ++ goto out; ++ if (single == false) ++ prev_range = true; ++ } else { ++ ret = nfs_multipath_parse_ip_range(net_ns, cursor, ++ ip_list_tmp, type); ++ if (ret != 0) ++ goto out; ++ prev_range = false; ++ } ++ } ++ ++out: ++ if (ret) ++ memset(ip_list_tmp, 0, sizeof(struct nfs_ip_list)); ++ ++ return ret; ++} ++ ++int nfs_multipath_parse_dns_list(char *buffer, struct net *net_ns, ++ struct multipath_mount_options *options) ++{ ++ struct NFS_ROUTE_DNS_INFO_S *dns_name_list_tmp = NULL; ++ char *cursor = NULL; ++ char *bufToParse; ++ ++ if (!nfs_multipath_parse_param_check(REMOTEDNSNAME, options)) ++ return -ENOTSUPP; ++ ++ pr_info("[MULTIPATH:%s] buffer %s\n", __func__, buffer); ++ // freed in nfs_free_parsed_mount_data ++ dns_name_list_tmp = kmalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), ++ GFP_KERNEL); ++ if (!dns_name_list_tmp) ++ return -ENOMEM; ++ ++ dns_name_list_tmp->dnsNameCount = 0; ++ bufToParse = buffer; ++ while (bufToParse) { ++ if (dns_name_list_tmp->dnsNameCount >= MAX_DNS_SUPPORTED) { ++ pr_err("%s: dnsname for %s reached %d,more than supported limit %d\n", ++ __func__, cursor, ++ dns_name_list_tmp->dnsNameCount, ++ MAX_DNS_SUPPORTED); ++ dns_name_list_tmp->dnsNameCount = 0; ++ return -ENOSPC; ++ } ++ cursor = strsep(&bufToParse, "~"); ++ if (!cursor) ++ break; ++ ++ strcpy(dns_name_list_tmp->routeRemoteDnsList ++ [dns_name_list_tmp->dnsNameCount].dnsname, ++ cursor); ++ dns_name_list_tmp->dnsNameCount++; ++ } ++ if (dns_name_list_tmp->dnsNameCount == 0) ++ return -EINVAL; ++ options->pRemoteDnsInfo = dns_name_list_tmp; ++ return 0; ++} ++ ++int nfs_multipath_parse_options_check_ipv4_valid(struct sockaddr_in *addr) ++{ ++ if (addr->sin_addr.s_addr == 0 || addr->sin_addr.s_addr == 0xffffffff) ++ return -EINVAL; ++ return 0; ++} ++ ++int nfs_multipath_parse_options_check_ipv6_valid(struct sockaddr_in6 *addr) ++{ ++ if (addr->sin6_addr.in6_u.u6_addr32[0] == 0 && ++ addr->sin6_addr.in6_u.u6_addr32[1] == 0 && ++ addr->sin6_addr.in6_u.u6_addr32[2] == 0 && ++ addr->sin6_addr.in6_u.u6_addr32[3] == 0) ++ return -EINVAL; ++ ++ if (addr->sin6_addr.in6_u.u6_addr32[0] == 0xffffffff && ++ addr->sin6_addr.in6_u.u6_addr32[1] == 0xffffffff && ++ addr->sin6_addr.in6_u.u6_addr32[2] == 0xffffffff && ++ addr->sin6_addr.in6_u.u6_addr32[3] == 0xffffffff) ++ return -EINVAL; ++ return 0; ++} ++ ++int nfs_multipath_parse_options_check_ip_valid(struct sockaddr_storage *address) ++{ ++ int rc = 0; ++ ++ if (address->ss_family == AF_INET) ++ rc = nfs_multipath_parse_options_check_ipv4_valid( ++ (struct sockaddr_in *)address); ++ else if (address->ss_family == AF_INET6) ++ rc = nfs_multipath_parse_options_check_ipv6_valid( ++ (struct sockaddr_in6 *)address); ++ else ++ rc = -EINVAL; ++ ++ return rc; ++} ++ ++int nfs_multipath_parse_options_check_valid( ++ struct multipath_mount_options *options) ++{ ++ int rc; ++ int i; ++ ++ if (options == NULL) ++ return 0; ++ ++ for (i = 0; i < options->local_ip_list->count; i++) { ++ rc = nfs_multipath_parse_options_check_ip_valid( ++ &options->local_ip_list->address[i]); ++ if (rc != 0) ++ return rc; ++ } ++ ++ for (i = 0; i < options->remote_ip_list->count; i++) { ++ rc = nfs_multipath_parse_options_check_ip_valid( ++ &options->remote_ip_list->address[i]); ++ if (rc != 0) ++ return rc; ++ } ++ ++ return 0; ++} ++int nfs_multipath_parse_options_check_duplicate( ++ struct multipath_mount_options *options) ++{ ++ int i; ++ int j; ++ ++ if (options == NULL || ++ options->local_ip_list->count == 0 || ++ options->remote_ip_list->count == 0) ++ ++ return 0; ++ ++ for (i = 0; i < options->local_ip_list->count; i++) { ++ for (j = 0; j < options->remote_ip_list->count; j++) { ++ if (rpc_cmp_addr((const struct sockaddr *) ++ &options->local_ip_list->address[i], ++ (const struct sockaddr *) ++ &options->remote_ip_list->address[j])) ++ return -ENOTSUPP; ++ } ++ } ++ return 0; ++} ++ ++int nfs_multipath_parse_options_check(struct multipath_mount_options *options) ++{ ++ int rc = 0; ++ ++ rc = nfs_multipath_parse_options_check_valid(options); ++ ++ if (rc != 0) { ++ pr_err("has invaild ip.\n"); ++ return rc; ++ } ++ ++ rc = nfs_multipath_parse_options_check_duplicate(options); ++ if (rc != 0) ++ return rc; ++ return rc; ++} ++ ++int nfs_multipath_alloc_options(void **enfs_option) ++{ ++ struct multipath_mount_options *options = NULL; ++ ++ options = kzalloc(sizeof(struct multipath_mount_options), GFP_KERNEL); ++ ++ if (options == NULL) ++ return -ENOMEM; ++ ++ options->local_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ if (options->local_ip_list == NULL) { ++ kfree(options); ++ return -ENOMEM; ++ } ++ ++ options->remote_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ if (options->remote_ip_list == NULL) { ++ kfree(options->local_ip_list); ++ kfree(options); ++ return -ENOMEM; ++ } ++ ++ options->pRemoteDnsInfo = kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), ++ GFP_KERNEL); ++ if (options->pRemoteDnsInfo == NULL) { ++ kfree(options->remote_ip_list); ++ kfree(options->local_ip_list); ++ kfree(options); ++ return -ENOMEM; ++ } ++ ++ *enfs_option = options; ++ return 0; ++} ++ ++int nfs_multipath_parse_options(enum nfsmultipathoptions type, ++ char *str, void **enfs_option, struct net *net_ns) ++{ ++ int rc; ++ struct multipath_mount_options *options = NULL; ++ ++ if ((str == NULL) || (enfs_option == NULL) || (net_ns == NULL)) ++ return -EINVAL; ++ ++ if (*enfs_option == NULL) { ++ rc = nfs_multipath_alloc_options(enfs_option); ++ if (rc != 0) { ++ enfs_log_error( ++ "alloc enfs_options failed! errno:%d\n", rc); ++ return rc; ++ } ++ } ++ ++ options = (struct multipath_mount_options *)*enfs_option; ++ ++ if (type == LOCALADDR || type == REMOUNTLOCALADDR || ++ type == REMOTEADDR || type == REMOUNTREMOTEADDR) { ++ rc = nfs_multipath_parse_ip_list(str, net_ns, options, type); ++ } else if (type == REMOTEDNSNAME) { ++ /* alloc and release need to modify */ ++ rc = nfs_multipath_parse_dns_list(str, net_ns, options); ++ } else { ++ rc = -EOPNOTSUPP; ++ } ++ ++ // after parsing cmd, need checking local and remote ++ // IP is same. if not means illegal cmd ++ if (rc == 0) ++ rc = nfs_multipath_parse_options_check_duplicate(options); ++ ++ if (rc == 0) ++ rc = nfs_multipath_parse_options_check(options); ++ ++ return rc; ++} ++ ++void nfs_multipath_free_options(void **enfs_option) ++{ ++ struct multipath_mount_options *options; ++ ++ if (enfs_option == NULL || *enfs_option == NULL) ++ return; ++ ++ options = (struct multipath_mount_options *)*enfs_option; ++ ++ if (options->remote_ip_list != NULL) { ++ kfree(options->remote_ip_list); ++ options->remote_ip_list = NULL; ++ } ++ ++ if (options->local_ip_list != NULL) { ++ kfree(options->local_ip_list); ++ options->local_ip_list = NULL; ++ } ++ ++ if (options->pRemoteDnsInfo != NULL) { ++ kfree(options->pRemoteDnsInfo); ++ options->pRemoteDnsInfo = NULL; ++ } ++ ++ kfree(options); ++ *enfs_option = NULL; ++} +diff --git a/fs/nfs/enfs/enfs_multipath_parse.h b/fs/nfs/enfs/enfs_multipath_parse.h +new file mode 100644 +index 000000000000..6f3e8703e3e2 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_parse.h +@@ -0,0 +1,22 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _ENFS_MULTIPATH_PARSE_H_ ++#define _ENFS_MULTIPATH_PARSE_H_ ++ ++#include "enfs.h" ++ ++struct multipath_mount_options { ++ struct nfs_ip_list *remote_ip_list; ++ struct nfs_ip_list *local_ip_list; ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; ++}; ++ ++int nfs_multipath_parse_options(enum nfsmultipathoptions type, ++ char *str, void **enfs_option, struct net *net_ns); ++void nfs_multipath_free_options(void **enfs_option); ++ ++#endif diff --git a/0004-add_enfs_module_for_sunrpc_multipatch.patch b/0004-add_enfs_module_for_sunrpc_multipatch.patch new file mode 100644 index 0000000..2c0fcc7 --- /dev/null +++ b/0004-add_enfs_module_for_sunrpc_multipatch.patch @@ -0,0 +1,1581 @@ +diff --git a/fs/nfs/enfs/enfs_multipath.h b/fs/nfs/enfs/enfs_multipath.h +new file mode 100644 +index 000000000000..e064c2929ced +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath.h +@@ -0,0 +1,24 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: enfs multipath ++ * Author: ++ * Create: 2023-07-31 ++ */ ++ ++#ifndef ENFS_MULTIPATH_H ++#define ENFS_MULTIPATH_H ++#include <linux/sunrpc/clnt.h> ++ ++#define MAX_XPRT_NUM_PER_CLIENT 32 ++ ++int enfs_multipath_init(void); ++void enfs_multipath_exit(void); ++void enfs_xprt_ippair_create(struct xprt_create *xprtargs, ++ struct rpc_clnt *clnt, void *data); ++int enfs_config_xprt_create_args(struct xprt_create *xprtargs, ++ struct rpc_create_args *args, ++ char *servername, size_t length); ++void print_enfs_multipath_addr(struct sockaddr *local, struct sockaddr *remote); ++ ++#endif // ENFS_MULTIPATH_H +diff --git a/fs/nfs/enfs/enfs_multipath_client.c b/fs/nfs/enfs/enfs_multipath_client.c +new file mode 100644 +index 000000000000..63c02898a42c +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_client.c +@@ -0,0 +1,340 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#include <linux/types.h> ++#include <linux/nfs.h> ++#include <linux/nfs4.h> ++#include <linux/nfs_fs.h> ++#include <linux/nfs_fs_sb.h> ++#include <linux/proc_fs.h> ++#include <linux/seq_file.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/addr.h> ++#include "enfs_multipath_client.h" ++#include "enfs_multipath_parse.h" ++ ++int ++nfs_multipath_client_mount_info_init(struct multipath_client_info *client_info, ++ const struct nfs_client_initdata *client_init_data) ++{ ++ struct multipath_mount_options *mount_options = ++ (struct multipath_mount_options *)client_init_data->enfs_option; ++ ++ if (mount_options->local_ip_list) { ++ client_info->local_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ ++ if (!client_info->local_ip_list) ++ return -ENOMEM; ++ ++ memcpy(client_info->local_ip_list, mount_options->local_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (mount_options->remote_ip_list) { ++ ++ client_info->remote_ip_list = ++ kzalloc(sizeof(struct nfs_ip_list), GFP_KERNEL); ++ ++ if (!client_info->remote_ip_list) { ++ kfree(client_info->local_ip_list); ++ client_info->local_ip_list = NULL; ++ return -ENOMEM; ++ } ++ memcpy(client_info->remote_ip_list, ++ mount_options->remote_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (mount_options->pRemoteDnsInfo) { ++ client_info->pRemoteDnsInfo = ++ kzalloc(sizeof(struct NFS_ROUTE_DNS_INFO_S), GFP_KERNEL); ++ ++ if (!client_info->pRemoteDnsInfo) { ++ kfree(client_info->local_ip_list); ++ client_info->local_ip_list = NULL; ++ kfree(client_info->remote_ip_list); ++ client_info->remote_ip_list = NULL; ++ return -ENOMEM; ++ } ++ memcpy(client_info->pRemoteDnsInfo, ++ mount_options->pRemoteDnsInfo, ++ sizeof(struct NFS_ROUTE_DNS_INFO_S)); ++ } ++ return 0; ++} ++ ++void nfs_multipath_client_info_free_work(struct work_struct *work) ++{ ++ ++ struct multipath_client_info *clp_info; ++ ++ if (work == NULL) ++ return; ++ ++ clp_info = container_of(work, struct multipath_client_info, work); ++ ++ if (clp_info->local_ip_list != NULL) { ++ kfree(clp_info->local_ip_list); ++ clp_info->local_ip_list = NULL; ++ } ++ if (clp_info->remote_ip_list != NULL) { ++ kfree(clp_info->remote_ip_list); ++ clp_info->remote_ip_list = NULL; ++ } ++ kfree(clp_info); ++} ++ ++void nfs_multipath_client_info_free(void *data) ++{ ++ struct multipath_client_info *clp_info = ++ (struct multipath_client_info *)data; ++ ++ if (clp_info == NULL) ++ return; ++ pr_info("free client info %p.\n", clp_info); ++ INIT_WORK(&clp_info->work, nfs_multipath_client_info_free_work); ++ schedule_work(&clp_info->work); ++} ++ ++int nfs_multipath_client_info_init(void **data, ++ const struct nfs_client_initdata *cl_init) ++{ ++ int rc; ++ struct multipath_client_info *info; ++ struct multipath_client_info **enfs_info; ++ /* no multi path info, no need do multipath init */ ++ if (cl_init->enfs_option == NULL) ++ return 0; ++ enfs_info = (struct multipath_client_info **)data; ++ if (enfs_info == NULL) ++ return -EINVAL; ++ ++ if (*enfs_info == NULL) ++ *enfs_info = kzalloc(sizeof(struct multipath_client_info), ++ GFP_KERNEL); ++ ++ if (*enfs_info == NULL) ++ return -ENOMEM; ++ ++ info = (struct multipath_client_info *)*enfs_info; ++ pr_info("init client info %p.\n", info); ++ rc = nfs_multipath_client_mount_info_init(info, cl_init); ++ if (rc) { ++ nfs_multipath_client_info_free((void *)info); ++ return rc; ++ } ++ return rc; ++} ++ ++bool nfs_multipath_ip_list_info_match(const struct nfs_ip_list *ip_list_src, ++ const struct nfs_ip_list *ip_list_dst) ++{ ++ int i; ++ int j; ++ bool is_find; ++ /* if both are equal or NULL, then return true. */ ++ if (ip_list_src == ip_list_dst) ++ return true; ++ ++ if ((ip_list_src == NULL || ip_list_dst == NULL)) ++ return false; ++ ++ if (ip_list_src->count != ip_list_dst->count) ++ return false; ++ ++ for (i = 0; i < ip_list_src->count; i++) { ++ is_find = false; ++ for (j = 0; j < ip_list_src->count; j++) { ++ if (rpc_cmp_addr_port( ++ (const struct sockaddr *) ++ &ip_list_src->address[i], ++ (const struct sockaddr *) ++ &ip_list_dst->address[j]) ++ ) { ++ is_find = true; ++ break; ++ } ++ } ++ if (is_find == false) ++ return false; ++ } ++ return true; ++} ++ ++int ++nfs_multipath_dns_list_info_match( ++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoSrc, ++ const struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfoDst) ++{ ++ int i; ++ ++ /* if both are equal or NULL, then return true. */ ++ if (pRemoteDnsInfoSrc == pRemoteDnsInfoDst) ++ return true; ++ ++ if ((pRemoteDnsInfoSrc == NULL || pRemoteDnsInfoDst == NULL)) ++ return false; ++ ++ if (pRemoteDnsInfoSrc->dnsNameCount != pRemoteDnsInfoDst->dnsNameCount) ++ return false; ++ ++ for (i = 0; i < pRemoteDnsInfoSrc->dnsNameCount; i++) { ++ if (!strcmp(pRemoteDnsInfoSrc->routeRemoteDnsList[i].dnsname, ++ pRemoteDnsInfoDst->routeRemoteDnsList[i].dnsname)) ++ return false; ++ } ++ return true; ++} ++ ++int nfs_multipath_client_info_match(void *src, void *dst) ++{ ++ int ret = true; ++ ++ struct multipath_client_info *src_info; ++ struct multipath_mount_options *dst_info; ++ ++ src_info = (struct multipath_client_info *)src; ++ dst_info = (struct multipath_mount_options *)dst; ++ pr_info("try match client .\n"); ++ ret = nfs_multipath_ip_list_info_match(src_info->local_ip_list, ++ dst_info->local_ip_list); ++ if (ret == false) { ++ pr_err("local_ip not match.\n"); ++ return ret; ++ } ++ ++ ret = nfs_multipath_ip_list_info_match(src_info->remote_ip_list, ++ dst_info->remote_ip_list); ++ if (ret == false) { ++ pr_err("remote_ip not match.\n"); ++ return ret; ++ } ++ ++ ret = nfs_multipath_dns_list_info_match(src_info->pRemoteDnsInfo, ++ dst_info->pRemoteDnsInfo); ++ if (ret == false) { ++ pr_err("dns not match.\n"); ++ return ret; ++ } ++ pr_info("try match client ret %d.\n", ret); ++ return ret; ++} ++ ++void nfs_multipath_print_ip_info(struct seq_file *mount_option, ++ struct nfs_ip_list *ip_list, ++ const char *type) ++{ ++ char buf[IP_ADDRESS_LEN_MAX + 1]; ++ int len = 0; ++ int i = 0; ++ ++ seq_printf(mount_option, ",%s=", type); ++ for (i = 0; i < ip_list->count; i++) { ++ len = rpc_ntop((struct sockaddr *)&ip_list->address[i], ++ buf, IP_ADDRESS_LEN_MAX); ++ if (len > 0 && len < IP_ADDRESS_LEN_MAX) ++ buf[len] = '\0'; ++ ++ if (i == 0) ++ seq_printf(mount_option, "%s", buf); ++ else ++ seq_printf(mount_option, "~%s", buf); ++ dfprintk(MOUNT, ++ "NFS: show nfs mount option type:%s %s [%s]\n", ++ type, buf, __func__); ++ } ++} ++ ++void nfs_multipath_print_dns_info(struct seq_file *mount_option, ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo, ++ const char *type) ++{ ++ int i = 0; ++ ++ seq_printf(mount_option, ",%s=", type); ++ for (i = 0; i < pRemoteDnsInfo->dnsNameCount; i++) { ++ if (i == 0) ++ seq_printf(mount_option, ++ "[%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ else if (i == pRemoteDnsInfo->dnsNameCount - 1) ++ seq_printf(mount_option, ",%s]", ++ pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ else ++ seq_printf(mount_option, ++ ",%s", pRemoteDnsInfo->routeRemoteDnsList[i].dnsname); ++ } ++} ++ ++ ++static void multipath_print_sockaddr(struct seq_file *seq, ++ struct sockaddr *addr) ++{ ++ switch (addr->sa_family) { ++ case AF_INET: { ++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; ++ ++ seq_printf(seq, "%pI4", &sin->sin_addr); ++ return; ++ } ++ case AF_INET6: { ++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; ++ ++ seq_printf(seq, "%pI6", &sin6->sin6_addr); ++ return; ++ } ++ default: ++ break; ++ } ++ pr_err("unsupport family:%d\n", addr->sa_family); ++} ++ ++static void multipath_print_enfs_info(struct seq_file *seq, ++ struct nfs_server *server) ++{ ++ struct sockaddr_storage peeraddr; ++ struct rpc_clnt *next = server->client; ++ ++ rpc_peeraddr(server->client, ++ (struct sockaddr *)&peeraddr, sizeof(peeraddr)); ++ seq_puts(seq, ",enfs_info="); ++ multipath_print_sockaddr(seq, (struct sockaddr *)&peeraddr); ++ ++ while (next->cl_parent) { ++ if (next == next->cl_parent) ++ break; ++ next = next->cl_parent; ++ } ++ seq_printf(seq, "_%u", next->cl_clid); ++} ++ ++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data) ++{ ++ struct nfs_server *server = data; ++ struct multipath_client_info *client_info = ++ server->nfs_client->cl_multipath_data; ++ ++ dfprintk(MOUNT, "NFS: show nfs mount option[%s]\n", __func__); ++ if ((client_info->remote_ip_list) && ++ (client_info->remote_ip_list->count > 0)) ++ nfs_multipath_print_ip_info(mount_option, ++ client_info->remote_ip_list, ++ "remoteaddrs"); ++ ++ if ((client_info->local_ip_list) && ++ (client_info->local_ip_list->count > 0)) ++ nfs_multipath_print_ip_info(mount_option, ++ client_info->local_ip_list, ++ "localaddrs"); ++ ++ if ((client_info->pRemoteDnsInfo) && ++ (client_info->pRemoteDnsInfo->dnsNameCount > 0)) ++ nfs_multipath_print_dns_info(mount_option, ++ client_info->pRemoteDnsInfo, ++ "remotednsname"); ++ ++ multipath_print_enfs_info(mount_option, server); ++} +diff --git a/fs/nfs/enfs/enfs_multipath_client.h b/fs/nfs/enfs/enfs_multipath_client.h +new file mode 100644 +index 000000000000..208f7260690d +--- /dev/null ++++ b/fs/nfs/enfs/enfs_multipath_client.h +@@ -0,0 +1,26 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS adapter. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef _ENFS_MULTIPATH_CLIENT_H_ ++#define _ENFS_MULTIPATH_CLIENT_H_ ++ ++#include "enfs.h" ++ ++struct multipath_client_info { ++ struct work_struct work; ++ struct nfs_ip_list *remote_ip_list; ++ struct nfs_ip_list *local_ip_list; ++ struct NFS_ROUTE_DNS_INFO_S *pRemoteDnsInfo; ++ s64 client_id; ++}; ++ ++int nfs_multipath_client_info_init(void **data, ++ const struct nfs_client_initdata *cl_init); ++void nfs_multipath_client_info_free(void *data); ++int nfs_multipath_client_info_match(void *src, void *dst); ++void nfs_multipath_client_info_show(struct seq_file *mount_option, void *data); ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_path.c b/fs/nfs/enfs/enfs_path.c +new file mode 100644 +index 000000000000..7355f8c2f672 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_path.c +@@ -0,0 +1,47 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++ ++#include <linux/sunrpc/metrics.h> ++#include <linux/sunrpc/xprt.h> ++ ++#include "enfs.h" ++#include "enfs_log.h" ++#include "enfs_path.h" ++ ++// only create ctx in this function ++// alloc iostat memory in create_clnt ++int enfs_alloc_xprt_ctx(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx; ++ ++ if (!xprt) { ++ enfs_log_error("invalid xprt pointer.\n"); ++ return -EINVAL; ++ } ++ ++ ctx = kzalloc(sizeof(struct enfs_xprt_context), GFP_KERNEL); ++ if (!ctx) { ++ enfs_log_error("add xprt test failed.\n"); ++ return -ENOMEM; ++ } ++ ++ xprt->multipath_context = (void *)ctx; ++ return 0; ++} ++ ++// free multi_context and iostat memory ++void enfs_free_xprt_ctx(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx = xprt->multipath_context; ++ ++ if (ctx) { ++ if (ctx->stats) { ++ rpc_free_iostats(ctx->stats); ++ ctx->stats = NULL; ++ } ++ kfree(xprt->multipath_context); ++ xprt->multipath_context = NULL; ++ } ++} +diff --git a/fs/nfs/enfs/enfs_path.h b/fs/nfs/enfs/enfs_path.h +new file mode 100644 +index 000000000000..97b1ef3730b8 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_path.h +@@ -0,0 +1,12 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++ ++#ifndef ENFS_PATH_H ++#define ENFS_PATH_H ++ ++int enfs_alloc_xprt_ctx(struct rpc_xprt *xprt); ++void enfs_free_xprt_ctx(struct rpc_xprt *xprt); ++ ++#endif // ENFS_PATH_H +diff --git a/fs/nfs/enfs/enfs_proc.c b/fs/nfs/enfs/enfs_proc.c +new file mode 100644 +index 000000000000..53fa1a07642f +--- /dev/null ++++ b/fs/nfs/enfs/enfs_proc.c +@@ -0,0 +1,545 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++#include <linux/module.h> ++#include <linux/proc_fs.h> ++#include <linux/seq_file.h> ++#include <linux/spinlock.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/metrics.h> ++#include <linux/sunrpc/xprtsock.h> ++#include <net/netns/generic.h> ++ ++#include "../../../net/sunrpc/netns.h" ++ ++#include "enfs.h" ++#include "enfs_log.h" ++#include "enfs_proc.h" ++#include "enfs_multipath.h" ++#include "pm_state.h" ++ ++#define ENFS_PROC_DIR "enfs" ++#define ENFS_PROC_PATH_STATUS_LEN 256 ++ ++static struct proc_dir_entry *enfs_proc_parent; ++ ++void ++enfs_iterate_each_rpc_clnt(int (*fn)(struct rpc_clnt *clnt, void *data), ++ void *data) ++{ ++ struct net *net; ++ struct sunrpc_net *sn; ++ struct rpc_clnt *clnt; ++ ++ rcu_read_lock(); ++ for_each_net_rcu(net) { ++ sn = net_generic(net, sunrpc_net_id); ++ if (sn == NULL) ++ continue; ++ spin_lock(&sn->rpc_client_lock); ++ list_for_each_entry(clnt, &sn->all_clients, cl_clients) { ++ fn(clnt, data); ++ } ++ spin_unlock(&sn->rpc_client_lock); ++ } ++ rcu_read_unlock(); ++} ++ ++struct proc_dir_entry *enfs_get_proc_parent(void) ++{ ++ return enfs_proc_parent; ++} ++ ++static int sockaddr_ip_to_str(struct sockaddr *addr, char *buf, int len) ++{ ++ switch (addr->sa_family) { ++ case AF_INET: { ++ struct sockaddr_in *sin = (struct sockaddr_in *)addr; ++ ++ snprintf(buf, len, "%pI4", &sin->sin_addr); ++ return 0; ++ } ++ case AF_INET6: { ++ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; ++ ++ snprintf(buf, len, "%pI6", &sin6->sin6_addr); ++ return 0; ++ } ++ default: ++ break; ++ } ++ return 1; ++} ++ ++static bool should_print(const char *name) ++{ ++ int i; ++ static const char * const proc_names[] = { ++ "READ", ++ "WRITE", ++ }; ++ ++ if (name == NULL) ++ return false; ++ ++ for (i = 0; i < ARRAY_SIZE(proc_names); i++) { ++ if (strcmp(name, proc_names[i]) == 0) ++ return true; ++ } ++ return false; ++} ++ ++struct enfs_xprt_iter { ++ unsigned int id; ++ struct seq_file *seq; ++ unsigned int max_addrs_length; ++}; ++ ++static int debug_show_xprt(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ ++ if (xprt->multipath_context) ++ ctx = xprt->multipath_context; ++ ++ pr_info(" xprt:%p ctx:%p main:%d queue_len:%lu.\n", xprt, ++ xprt->multipath_context, ++ ctx ? ctx->main : false, ++ atomic_long_read(&xprt->queuelen)); ++ return 0; ++} ++ ++static int debug_show_clnt(struct rpc_clnt *clnt, void *data) ++{ ++ pr_info(" clnt %d addr:%p enfs:%d\n", ++ clnt->cl_clid, clnt, ++ clnt->cl_enfs); ++ rpc_clnt_iterate_for_each_xprt(clnt, debug_show_xprt, NULL); ++ return 0; ++} ++ ++static void debug_print_all_xprt(void) ++{ ++ enfs_iterate_each_rpc_clnt(debug_show_clnt, NULL); ++} ++ ++static ++void enfs_proc_format_xprt_addr_display(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ char *local_name_buf, ++ int local_name_buf_len, ++ char *remote_name_buf, ++ int remote_name_buf_len) ++{ ++ int err; ++ struct sockaddr_storage srcaddr; ++ struct enfs_xprt_context *ctx; ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ ++ sockaddr_ip_to_str((struct sockaddr *)&xprt->addr, ++ remote_name_buf, remote_name_buf_len); ++ ++ // get local address depend one main or not ++ if (enfs_is_main_xprt(xprt)) { ++ err = rpc_localaddr(clnt, (struct sockaddr *)&srcaddr, ++ sizeof(srcaddr)); ++ if (err != 0) ++ (void)snprintf(local_name_buf, ++ local_name_buf_len, "Unknown"); ++ else ++ sockaddr_ip_to_str((struct sockaddr *)&srcaddr, ++ local_name_buf, ++ local_name_buf_len); ++ } else { ++ sockaddr_ip_to_str((struct sockaddr *)&ctx->srcaddr, ++ local_name_buf, ++ local_name_buf_len); ++ } ++} ++ ++static int enfs_show_xprt_stats(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ unsigned int op; ++ unsigned int maxproc = clnt->cl_maxproc; ++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; ++ struct enfs_xprt_context *ctx; ++ char local_name[INET6_ADDRSTRLEN]; ++ char remote_name[INET6_ADDRSTRLEN]; ++ ++ if (!xprt->multipath_context) ++ return 0; ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ ++ enfs_proc_format_xprt_addr_display(clnt, xprt, local_name, ++ sizeof(local_name), ++ remote_name, sizeof(remote_name)); ++ ++ seq_printf(iter->seq, "%-6u%-*s%-*s", iter->id, ++ iter->max_addrs_length + 4, ++ local_name, ++ iter->max_addrs_length + 4, ++ remote_name); ++ ++ iter->id++; ++ ++ for (op = 0; op < maxproc; op++) { ++ if (!should_print(clnt->cl_procinfo[op].p_name)) ++ continue; ++ ++ seq_printf(iter->seq, "%-22lu%-22Lu%-22Lu", ++ ctx->stats[op].om_ops, ++ ctx->stats[op].om_ops == 0 ? 0 : ++ ktime_to_ms(ctx->stats[op].om_rtt) / ++ ctx->stats[op].om_ops, ++ ctx->stats[op].om_ops == 0 ? 0 : ++ ktime_to_ms(ctx->stats[op].om_execute) / ++ ctx->stats[op].om_ops); ++ } ++ seq_puts(iter->seq, "\n"); ++ return 0; ++} ++ ++static int rpc_proc_show_path_status(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; ++ struct enfs_xprt_context *ctx = NULL; ++ char local_name[INET6_ADDRSTRLEN] = {0}; ++ char remote_name[INET6_ADDRSTRLEN] = {0}; ++ char multiapth_status[ENFS_PROC_PATH_STATUS_LEN] = {0}; ++ char xprt_status[ENFS_PROC_PATH_STATUS_LEN] = {0}; ++ ++ if (!xprt->multipath_context) { ++ enfs_log_debug("multipath_context is null.\n"); ++ return 0; ++ } ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ ++ enfs_proc_format_xprt_addr_display(clnt, xprt, ++ local_name, ++ sizeof(local_name), ++ remote_name, sizeof(remote_name)); ++ ++ pm_get_path_state_desc(xprt, ++ multiapth_status, ++ ENFS_PROC_PATH_STATUS_LEN); ++ ++ pm_get_xprt_state_desc(xprt, ++ xprt_status, ++ ENFS_PROC_PATH_STATUS_LEN); ++ ++ seq_printf(iter->seq, "%-6u%-*s%-*s%-12s%-12s\n", ++ iter->id, iter->max_addrs_length + 4, ++ local_name, iter->max_addrs_length + 4, ++ remote_name, multiapth_status, ++ xprt_status); ++ iter->id++; ++ return 0; ++} ++ ++static int enfs_get_max_addrs_length(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ struct enfs_xprt_iter *iter = (struct enfs_xprt_iter *)data; ++ char local_name[INET6_ADDRSTRLEN]; ++ char remote_name[INET6_ADDRSTRLEN]; ++ ++ enfs_proc_format_xprt_addr_display(clnt, xprt, ++ local_name, sizeof(local_name), ++ remote_name, sizeof(remote_name)); ++ ++ if (iter->max_addrs_length < strlen(local_name)) ++ iter->max_addrs_length = strlen(local_name); ++ ++ if (iter->max_addrs_length < strlen(remote_name)) ++ iter->max_addrs_length = strlen(remote_name); ++ ++ return 0; ++} ++ ++static int rpc_proc_clnt_showpath(struct seq_file *seq, void *v) ++{ ++ struct rpc_clnt *clnt = seq->private; ++ struct enfs_xprt_iter iter; ++ ++ iter.seq = seq; ++ iter.id = 0; ++ iter.max_addrs_length = 0; ++ ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ enfs_get_max_addrs_length, ++ (void *)&iter); ++ ++ seq_printf(seq, "%-6s%-*s%-*s%-12s%-12s\n", "id", ++ iter.max_addrs_length + 4, ++ "local_addr", ++ iter.max_addrs_length + 4, ++ "remote_addr", ++ "path_state", ++ "xprt_state"); ++ ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ rpc_proc_show_path_status, ++ (void *)&iter); ++ return 0; ++} ++ ++static int enfs_rpc_proc_show(struct seq_file *seq, void *v) ++{ ++ struct rpc_clnt *clnt = seq->private; ++ struct enfs_xprt_iter iter; ++ ++ iter.seq = seq; ++ iter.id = 0; ++ iter.max_addrs_length = 0; ++ ++ debug_print_all_xprt(); ++ pr_info("enfs proc clnt:%p\n", clnt); ++ ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ enfs_get_max_addrs_length, ++ (void *)&iter); ++ ++ seq_printf(seq, "%-6s%-*s%-*s%-22s%-22s%-22s%-22s%-22s%-22s\n", "id", ++ iter.max_addrs_length + 4, "local_addr", ++ iter.max_addrs_length + 4, ++ "remote_addr", "r_count", ++ "r_rtt", "r_exec", "w_count", "w_rtt", "w_exec"); ++ ++ // rpc_clnt_show_stats(seq, clnt); ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ enfs_show_xprt_stats, ++ (void *)&iter); ++ return 0; ++} ++ ++static int rpc_proc_open(struct inode *inode, struct file *file) ++{ ++ struct rpc_clnt *clnt = PDE_DATA(inode); ++ ++ pr_info("%s %p\n", __func__, clnt); ++ return single_open(file, enfs_rpc_proc_show, clnt); ++} ++ ++static int enfs_reset_xprt_stats(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void *data) ++{ ++ unsigned int op; ++ struct enfs_xprt_context *ctx; ++ unsigned int maxproc = clnt->cl_maxproc; ++ struct rpc_iostats stats = {0}; ++ ++ if (!xprt->multipath_context) ++ return 0; ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ ++ for (op = 0; op < maxproc; op++) { ++ spin_lock(&ctx->stats[op].om_lock); ++ ctx->stats[op] = stats; ++ spin_unlock(&ctx->stats[op].om_lock); ++ } ++ return 0; ++} ++ ++static void trim_newline_ch(char *str, int len) ++{ ++ int i; ++ ++ for (i = 0; str[i] != '\0' && i < len; i++) { ++ if (str[i] == '\n') ++ str[i] = '\0'; ++ } ++} ++ ++static ssize_t enfs_proc_write(struct file *file, ++ const char __user *user_buf, ++ size_t len, ++ loff_t *offset) ++{ ++ char buffer[128]; ++ struct rpc_clnt *clnt = ++ ((struct seq_file *)file->private_data)->private; ++ ++ if (len >= sizeof(buffer)) ++ return -E2BIG; ++ ++ if (copy_from_user(buffer, user_buf, len) != 0) ++ return -EFAULT; ++ ++ buffer[len] = '\0'; ++ trim_newline_ch(buffer, len); ++ if (strcmp(buffer, "reset") != 0) ++ return -EINVAL; ++ ++ rpc_clnt_iterate_for_each_xprt(clnt, enfs_reset_xprt_stats, NULL); ++ return len; ++} ++ ++static int rpc_proc_show_path(struct inode *inode, struct file *file) ++{ ++ struct rpc_clnt *clnt = PDE_DATA(inode); ++ ++ return single_open(file, rpc_proc_clnt_showpath, clnt); ++} ++ ++static const struct file_operations rpc_proc_fops = { ++ .owner = THIS_MODULE, ++ .open = rpc_proc_open, ++ .read = seq_read, ++ .llseek = seq_lseek, ++ .release = single_release, ++ .write = enfs_proc_write, ++}; ++ ++static const struct file_operations rpc_show_path_fops = { ++ .owner = THIS_MODULE, ++ .open = rpc_proc_show_path, ++ .read = seq_read, ++ .llseek = seq_lseek, ++ .release = single_release, ++}; ++ ++static int clnt_proc_name(struct rpc_clnt *clnt, char *buf, int len) ++{ ++ int ret; ++ ++ ret = snprintf(buf, len, "%s_%u", ++ rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR), ++ clnt->cl_clid); ++ if (ret > len) ++ return -E2BIG; ++ return 0; ++} ++ ++static int enfs_proc_create_file(struct rpc_clnt *clnt) ++{ ++ int err; ++ char buf[128]; ++ ++ struct proc_dir_entry *clnt_entry; ++ struct proc_dir_entry *stat_entry; ++ ++ err = clnt_proc_name(clnt, buf, sizeof(buf)); ++ if (err) ++ return err; ++ ++ clnt_entry = proc_mkdir(buf, enfs_proc_parent); ++ if (clnt_entry == NULL) ++ return -EINVAL; ++ ++ stat_entry = proc_create_data("stat", ++ 0, clnt_entry, ++ &rpc_proc_fops, clnt); ++ ++ if (stat_entry == NULL) ++ return -EINVAL; ++ ++ stat_entry = proc_create_data("path", ++ 0, clnt_entry, ++ &rpc_show_path_fops, clnt); ++ ++ if (stat_entry == NULL) ++ return -EINVAL; ++ ++ return 0; ++} ++ ++void enfs_count_iostat(struct rpc_task *task) ++{ ++ struct enfs_xprt_context *ctx = task->tk_xprt->multipath_context; ++ ++ if (!ctx || !ctx->stats) ++ return; ++ rpc_count_iostats(task, ctx->stats); ++} ++ ++static void enfs_proc_delete_file(struct rpc_clnt *clnt) ++{ ++ int err; ++ char buf[128]; ++ ++ err = clnt_proc_name(clnt, buf, sizeof(buf)); ++ if (err) { ++ pr_err("gen clnt name failed.\n"); ++ return; ++ } ++ remove_proc_subtree(buf, enfs_proc_parent); ++} ++ ++// create proc file "/porc/enfs/[mount_ip]_[id]/stat" ++int enfs_proc_create_clnt(struct rpc_clnt *clnt) ++{ ++ int err; ++ ++ err = enfs_proc_create_file(clnt); ++ if (err) { ++ pr_err("create client %d\n", err); ++ return err; ++ } ++ ++ return 0; ++} ++ ++void enfs_proc_delete_clnt(struct rpc_clnt *clnt) ++{ ++ if (clnt->cl_enfs) ++ enfs_proc_delete_file(clnt); ++} ++ ++static int enfs_proc_create_parent(void) ++{ ++ enfs_proc_parent = proc_mkdir(ENFS_PROC_DIR, NULL); ++ ++ if (enfs_proc_parent == NULL) { ++ pr_err("Enfs create proc dir err\n"); ++ return -ENOMEM; ++ } ++ return 0; ++} ++ ++static void enfs_proc_delete_parent(void) ++{ ++ remove_proc_entry(ENFS_PROC_DIR, NULL); ++} ++ ++static int enfs_proc_init_create_clnt(struct rpc_clnt *clnt, void *data) ++{ ++ if (clnt->cl_enfs) ++ enfs_proc_create_file(clnt); ++ return 0; ++} ++ ++static int enfs_proc_destroy_clnt(struct rpc_clnt *clnt, void *data) ++{ ++ if (clnt->cl_enfs) ++ enfs_proc_delete_file(clnt); ++ return 0; ++} ++ ++int enfs_proc_init(void) ++{ ++ int err; ++ ++ err = enfs_proc_create_parent(); ++ if (err) ++ return err; ++ ++ enfs_iterate_each_rpc_clnt(enfs_proc_init_create_clnt, NULL); ++ return 0; ++} ++ ++void enfs_proc_exit(void) ++{ ++ enfs_iterate_each_rpc_clnt(enfs_proc_destroy_clnt, NULL); ++ enfs_proc_delete_parent(); ++} +diff --git a/fs/nfs/enfs/enfs_proc.h b/fs/nfs/enfs/enfs_proc.h +new file mode 100644 +index 000000000000..321951031c2e +--- /dev/null ++++ b/fs/nfs/enfs/enfs_proc.h +@@ -0,0 +1,21 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Client-side ENFS PROC. ++ * ++ * Copyright (c) 2023. Huawei Technologies Co., Ltd. All rights reserved. ++ */ ++#ifndef ENFS_PROC_H ++#define ENFS_PROC_H ++ ++struct rpc_clnt; ++struct rpc_task; ++struct proc_dir_entry; ++ ++int enfs_proc_init(void); ++void enfs_proc_exit(void); ++struct proc_dir_entry *enfs_get_proc_parent(void); ++int enfs_proc_create_clnt(struct rpc_clnt *clnt); ++void enfs_proc_delete_clnt(struct rpc_clnt *clnt); ++void enfs_count_iostat(struct rpc_task *task); ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_remount.c b/fs/nfs/enfs/enfs_remount.c +new file mode 100644 +index 000000000000..2c3fe125c735 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_remount.c +@@ -0,0 +1,221 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: remount ip source file ++ * Author: y00583252 ++ * Create: 2023-08-12 ++ */ ++#include "enfs_remount.h" ++ ++#include <linux/string.h> ++#include <linux/in.h> ++#include <linux/in6.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/spinlock.h> ++#include <linux/sunrpc/addr.h> ++#include <linux/sunrpc/metrics.h> ++#include <linux/sunrpc/xprtmultipath.h> ++#include <linux/sunrpc/xprtsock.h> ++#include <linux/sunrpc/xprt.h> ++#include <linux/smp.h> ++#include <linux/delay.h> ++ ++#include "enfs.h" ++#include "enfs_log.h" ++#include "enfs_multipath.h" ++#include "enfs_multipath_parse.h" ++#include "enfs_path.h" ++#include "enfs_proc.h" ++#include "enfs_multipath_client.h" ++ ++static bool enfs_rpc_xprt_switch_need_delete_addr( ++ struct multipath_mount_options *enfs_option, ++ struct sockaddr *dstaddr, struct sockaddr *srcaddr) ++{ ++ int i; ++ bool find_same_ip = false; ++ int32_t local_total; ++ int32_t remote_total; ++ ++ local_total = enfs_option->local_ip_list->count; ++ remote_total = enfs_option->remote_ip_list->count; ++ if (local_total == 0 || remote_total == 0) { ++ pr_err("no ip list is present.\n"); ++ return false; ++ } ++ ++ for (i = 0; i < local_total; i++) { ++ find_same_ip = ++ rpc_cmp_addr((struct sockaddr *) ++ &enfs_option->local_ip_list->address[i], ++ srcaddr); ++ if (find_same_ip) ++ break; ++ } ++ ++ if (find_same_ip == false) ++ return true; ++ ++ find_same_ip = false; ++ for (i = 0; i < remote_total; i++) { ++ find_same_ip = ++ rpc_cmp_addr((struct sockaddr *) ++ &enfs_option->remote_ip_list->address[i], ++ dstaddr); ++ if (find_same_ip) ++ break; ++ } ++ ++ if (find_same_ip == false) ++ return true; ++ ++ return false; ++} ++ ++// Used in rcu_lock ++static bool enfs_delete_xprt_from_switch(struct rpc_xprt *xprt, ++ void *enfs_option, ++ struct rpc_xprt_switch *xps) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ struct multipath_mount_options *mopt = ++ (struct multipath_mount_options *)enfs_option; ++ ++ if (enfs_is_main_xprt(xprt)) ++ return true; ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (enfs_rpc_xprt_switch_need_delete_addr(mopt, ++ (struct sockaddr *)&xprt->addr, ++ (struct sockaddr *)&ctx->srcaddr)) { ++ ++ print_enfs_multipath_addr((struct sockaddr *)&ctx->srcaddr, ++ (struct sockaddr *)&xprt->addr); ++ rpc_xprt_switch_remove_xprt(xps, xprt); ++ return true; ++ } ++ ++ return false; ++} ++ ++void enfs_clnt_delete_obsolete_xprts(struct nfs_client *nfs_client, ++ void *enfs_option) ++{ ++ int xprt_count = 0; ++ struct rpc_xprt *pos = NULL; ++ struct rpc_xprt_switch *xps = NULL; ++ ++ rcu_read_lock(); ++ xps = xprt_switch_get( ++ rcu_dereference( ++ nfs_client->cl_rpcclient->cl_xpi.xpi_xpswitch)); ++ if (xps == NULL) { ++ rcu_read_unlock(); ++ xprt_switch_put(xps); ++ return; ++ } ++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { ++ if (xprt_count < MAX_XPRT_NUM_PER_CLIENT) { ++ if (enfs_delete_xprt_from_switch( ++ pos, enfs_option, xps) == false) ++ xprt_count++; ++ } else ++ rpc_xprt_switch_remove_xprt(xps, pos); ++ } ++ rcu_read_unlock(); ++ xprt_switch_put(xps); ++} ++ ++int enfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option) ++{ ++ int errno = 0; ++ char servername[48]; ++ struct multipath_mount_options *remount_lists = ++ (struct multipath_mount_options *)enfs_option; ++ struct multipath_client_info *client_info = ++ (struct multipath_client_info *)nfs_client->cl_multipath_data; ++ struct xprt_create xprtargs; ++ struct rpc_create_args args = { ++ .protocol = nfs_client->cl_proto, ++ .net = nfs_client->cl_net, ++ .addrsize = nfs_client->cl_addrlen, ++ .servername = nfs_client->cl_hostname, ++ }; ++ ++ memset(&xprtargs, 0, sizeof(struct xprt_create)); ++ ++ //mount is not use multipath ++ if (client_info == NULL || enfs_option == NULL) { ++ enfs_log_error( ++ "mount information or remount information is empty.\n"); ++ return -EINVAL; ++ } ++ ++ //remount : localaddrs and remoteaddrs are empty ++ if (remount_lists->local_ip_list->count == 0 && ++ remount_lists->remote_ip_list->count == 0) { ++ enfs_log_info("remount local_ip_list and remote_ip_list are NULL\n"); ++ return 0; ++ } ++ ++ errno = enfs_config_xprt_create_args(&xprtargs, ++ &args, servername, sizeof(servername)); ++ ++ if (errno) { ++ enfs_log_error("config_xprt_create failed! errno:%d\n", errno); ++ return errno; ++ } ++ ++ if (remount_lists->local_ip_list->count == 0) { ++ if (client_info->local_ip_list->count == 0) { ++ errno = rpc_localaddr(nfs_client->cl_rpcclient, ++ (struct sockaddr *) ++ &remount_lists->local_ip_list->address[0], ++ sizeof(struct sockaddr_storage)); ++ if (errno) { ++ enfs_log_error("get clnt srcaddr errno:%d\n", ++ errno); ++ return errno; ++ } ++ remount_lists->local_ip_list->count = 1; ++ } else ++ memcpy(remount_lists->local_ip_list, ++ client_info->local_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ if (remount_lists->remote_ip_list->count == 0) { ++ if (client_info->remote_ip_list->count == 0) { ++ errno = rpc_peeraddr(nfs_client->cl_rpcclient, ++ (struct sockaddr *) ++ &remount_lists->remote_ip_list->address[0], ++ sizeof(struct sockaddr_storage)); ++ if (errno == 0) { ++ enfs_log_error("get clnt dstaddr errno:%d\n", ++ errno); ++ return errno; ++ } ++ remount_lists->remote_ip_list->count = 1; ++ } else ++ memcpy(remount_lists->remote_ip_list, ++ client_info->remote_ip_list, ++ sizeof(struct nfs_ip_list)); ++ } ++ ++ enfs_log_info("Remount creating new links...\n"); ++ enfs_xprt_ippair_create(&xprtargs, ++ nfs_client->cl_rpcclient, ++ remount_lists); ++ ++ enfs_log_info("Remount deleting obsolete links...\n"); ++ enfs_clnt_delete_obsolete_xprts(nfs_client, remount_lists); ++ ++ memcpy(client_info->local_ip_list, ++ remount_lists->local_ip_list, ++ sizeof(struct nfs_ip_list)); ++ memcpy(client_info->remote_ip_list, ++ remount_lists->remote_ip_list, ++ sizeof(struct nfs_ip_list)); ++ ++ return 0; ++} +diff --git a/fs/nfs/enfs/enfs_remount.h b/fs/nfs/enfs/enfs_remount.h +new file mode 100644 +index 000000000000..a663ed257004 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_remount.h +@@ -0,0 +1,15 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: remount ip header file ++ * Author: y00583252 ++ * Create: 2023-08-12 ++ */ ++#ifndef _ENFS_REMOUNT_ ++#define _ENFS_REMOUNT_ ++#include <linux/string.h> ++#include "enfs.h" ++ ++int enfs_remount_iplist(struct nfs_client *nfs_client, void *enfs_option); ++ ++#endif +diff --git a/fs/nfs/enfs/enfs_roundrobin.c b/fs/nfs/enfs/enfs_roundrobin.c +new file mode 100644 +index 000000000000..4e4eda784a3e +--- /dev/null ++++ b/fs/nfs/enfs/enfs_roundrobin.c +@@ -0,0 +1,255 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++#include <linux/spinlock.h> ++#include <linux/module.h> ++#include <linux/printk.h> ++#include <linux/kref.h> ++#include <linux/rculist.h> ++#include <linux/types.h> ++#include <linux/sunrpc/xprt.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/xprtmultipath.h> ++#include "enfs_roundrobin.h" ++ ++#include "enfs.h" ++#include "enfs_config.h" ++#include "pm_state.h" ++ ++typedef struct rpc_xprt *(*enfs_xprt_switch_find_xprt_t)( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur); ++static const struct rpc_xprt_iter_ops enfs_xprt_iter_roundrobin; ++static const struct rpc_xprt_iter_ops enfs_xprt_iter_singular; ++ ++static bool enfs_xprt_is_active(struct rpc_xprt *xprt) ++{ ++ enum pm_path_state state; ++ ++ if (kref_read(&xprt->kref) <= 0) ++ return false; ++ ++ state = pm_get_path_state(xprt); ++ if (state == PM_STATE_NORMAL) ++ return true; ++ ++ return false; ++} ++ ++static struct rpc_xprt *enfs_lb_set_cursor_xprt( ++ struct rpc_xprt_switch *xps, struct rpc_xprt **cursor, ++ enfs_xprt_switch_find_xprt_t find_next) ++{ ++ struct rpc_xprt *pos; ++ struct rpc_xprt *old; ++ ++ old = smp_load_acquire(cursor); /* read latest cursor */ ++ pos = find_next(xps, old); ++ smp_store_release(cursor, pos); /* let cursor point to pos */ ++ return pos; ++} ++ ++static ++struct rpc_xprt *enfs_lb_find_next_entry_roundrobin( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *pos; ++ struct rpc_xprt *prev = NULL; ++ bool found = false; ++ struct rpc_xprt *min_queuelen_xprt = NULL; ++ unsigned long pos_xprt_queuelen; ++ unsigned long min_xprt_queuelen = 0; ++ ++ unsigned long xps_queuelen = atomic_long_read(&xps->xps_queuelen); ++ // delete origin xprt ++ unsigned int multipath_nactive = READ_ONCE(xps->xps_nactive) - 1; ++ ++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { ++ if (enfs_is_main_xprt(pos) || !enfs_xprt_is_active(pos)) { ++ prev = pos; ++ continue; ++ } ++ ++ pos_xprt_queuelen = atomic_long_read(&pos->queuelen); ++ if (min_queuelen_xprt == NULL || ++ pos_xprt_queuelen < min_xprt_queuelen) { ++ ++ min_queuelen_xprt = pos; ++ min_xprt_queuelen = pos_xprt_queuelen; ++ } ++ ++ if (cur == prev) ++ found = true; ++ ++ if (found && pos_xprt_queuelen * ++ multipath_nactive <= xps_queuelen) ++ return pos; ++ prev = pos; ++ }; ++ ++ return min_queuelen_xprt; ++} ++ ++struct rpc_xprt *enfs_lb_switch_find_first_active_xprt( ++ struct rpc_xprt_switch *xps) ++{ ++ struct rpc_xprt *pos; ++ ++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { ++ if (enfs_xprt_is_active(pos)) ++ return pos; ++ }; ++ return NULL; ++} ++ ++struct rpc_xprt *enfs_lb_switch_get_main_xprt(struct rpc_xprt_switch *xps) ++{ ++ return list_first_or_null_rcu(&xps->xps_xprt_list, ++ struct rpc_xprt, xprt_switch); ++} ++ ++static struct rpc_xprt *enfs_lb_switch_get_next_xprt_roundrobin( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *xprt; ++ ++ // disable multipath ++ if (enfs_get_config_multipath_state()) ++ return enfs_lb_switch_get_main_xprt(xps); ++ ++ xprt = enfs_lb_find_next_entry_roundrobin(xps, cur); ++ if (xprt != NULL) ++ return xprt; ++ ++ return enfs_lb_switch_get_main_xprt(xps); ++} ++ ++static ++struct rpc_xprt *enfs_lb_iter_next_entry_roundrobin(struct rpc_xprt_iter *xpi) ++{ ++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); ++ ++ if (xps == NULL) ++ return NULL; ++ ++ return enfs_lb_set_cursor_xprt(xps, &xpi->xpi_cursor, ++ enfs_lb_switch_get_next_xprt_roundrobin); ++} ++ ++static ++struct rpc_xprt *enfs_lb_switch_find_singular_entry( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *pos; ++ bool found = false; ++ ++ list_for_each_entry_rcu(pos, &xps->xps_xprt_list, xprt_switch) { ++ if (cur == pos) ++ found = true; ++ ++ if (found && enfs_xprt_is_active(pos)) ++ return pos; ++ } ++ return NULL; ++} ++ ++struct rpc_xprt *enfs_lb_get_singular_xprt( ++ struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *xprt; ++ ++ if (xps == NULL) ++ return NULL; ++ ++ // disable multipath ++ if (enfs_get_config_multipath_state()) ++ return enfs_lb_switch_get_main_xprt(xps); ++ ++ if (cur == NULL || xps->xps_nxprts < 2) ++ return enfs_lb_switch_find_first_active_xprt(xps); ++ ++ xprt = enfs_lb_switch_find_singular_entry(xps, cur); ++ if (!xprt) ++ return enfs_lb_switch_get_main_xprt(xps); ++ ++ return xprt; ++} ++ ++static ++struct rpc_xprt *enfs_lb_iter_next_entry_sigular(struct rpc_xprt_iter *xpi) ++{ ++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); ++ ++ if (xps == NULL) ++ return NULL; ++ ++ return enfs_lb_set_cursor_xprt(xps, &xpi->xpi_cursor, ++ enfs_lb_get_singular_xprt); ++} ++ ++static void enfs_lb_iter_default_rewind(struct rpc_xprt_iter *xpi) ++{ ++ WRITE_ONCE(xpi->xpi_cursor, NULL); ++} ++ ++static void enfs_lb_switch_set_roundrobin(struct rpc_clnt *clnt) ++{ ++ struct rpc_xprt_switch *xps; ++ ++ rcu_read_lock(); ++ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch); ++ rcu_read_unlock(); ++ if (clnt->cl_vers == 3) { ++ ++ if (READ_ONCE(xps->xps_iter_ops) != &enfs_xprt_iter_roundrobin) ++ WRITE_ONCE(xps->xps_iter_ops, ++ &enfs_xprt_iter_roundrobin); ++ ++ return; ++ } ++ if (READ_ONCE(xps->xps_iter_ops) != &enfs_xprt_iter_singular) ++ WRITE_ONCE(xps->xps_iter_ops, &enfs_xprt_iter_singular); ++} ++ ++static ++struct rpc_xprt *enfs_lb_switch_find_current(struct list_head *head, ++ const struct rpc_xprt *cur) ++{ ++ struct rpc_xprt *pos; ++ ++ list_for_each_entry_rcu(pos, head, xprt_switch) { ++ if (cur == pos) ++ return pos; ++ } ++ return NULL; ++} ++ ++static struct rpc_xprt *enfs_lb_iter_current_entry(struct rpc_xprt_iter *xpi) ++{ ++ struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); ++ struct list_head *head; ++ ++ if (xps == NULL) ++ return NULL; ++ head = &xps->xps_xprt_list; ++ if (xpi->xpi_cursor == NULL || xps->xps_nxprts < 2) ++ return enfs_lb_switch_get_main_xprt(xps); ++ return enfs_lb_switch_find_current(head, xpi->xpi_cursor); ++} ++ ++void enfs_lb_set_policy(struct rpc_clnt *clnt) ++{ ++ enfs_lb_switch_set_roundrobin(clnt); ++} ++ ++static const struct rpc_xprt_iter_ops enfs_xprt_iter_roundrobin = { ++ .xpi_rewind = enfs_lb_iter_default_rewind, ++ .xpi_xprt = enfs_lb_iter_current_entry, ++ .xpi_next = enfs_lb_iter_next_entry_roundrobin, ++}; ++ ++static const struct rpc_xprt_iter_ops enfs_xprt_iter_singular = { ++ .xpi_rewind = enfs_lb_iter_default_rewind, ++ .xpi_xprt = enfs_lb_iter_current_entry, ++ .xpi_next = enfs_lb_iter_next_entry_sigular, ++}; +diff --git a/fs/nfs/enfs/enfs_roundrobin.h b/fs/nfs/enfs/enfs_roundrobin.h +new file mode 100644 +index 000000000000..b72b088a6258 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_roundrobin.h +@@ -0,0 +1,9 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++#ifndef ENFS_ROUNDROBIN_H ++#define ENFS_ROUNDROBIN_H ++ ++void enfs_lb_set_policy(struct rpc_clnt *clnt); ++#endif diff --git a/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch b/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch new file mode 100644 index 0000000..cc6b677 --- /dev/null +++ b/0005-add_enfs_module_for_sunrpc_failover_and_configure.patch @@ -0,0 +1,1607 @@ +diff --git a/fs/nfs/enfs/enfs_config.c b/fs/nfs/enfs/enfs_config.c +new file mode 100644 +index 000000000000..11aa7a00385b +--- /dev/null ++++ b/fs/nfs/enfs/enfs_config.c +@@ -0,0 +1,378 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ */ ++#include <linux/cdev.h> ++#include <linux/errno.h> ++#include <linux/fcntl.h> ++#include <linux/fs.h> ++#include <linux/kernel.h> ++#include <linux/kthread.h> ++#include <linux/slab.h> ++#include <linux/string.h> ++#include <linux/uaccess.h> ++#include <linux/delay.h> ++ ++#include "enfs_errcode.h" ++#include "enfs_log.h" ++#include "enfs_config.h" ++ ++#define MAX_FILE_SIZE 8192 ++#define STRING_BUF_SIZE 128 ++#define CONFIG_FILE_PATH "/etc/enfs/config.ini" ++#define ENFS_NOTIFY_FILE_PERIOD 1000UL ++ ++#define MAX_PATH_DETECT_INTERVAL 300 ++#define MIN_PATH_DETECT_INTERVAL 5 ++#define MAX_PATH_DETECT_TIMEOUT 60 ++#define MIN_PATH_DETECT_TIMEOUT 1 ++#define MAX_MULTIPATH_TIMEOUT 60 ++#define MIN_MULTIPATH_TIMEOUT 0 ++#define MAX_MULTIPATH_STATE ENFS_MULTIPATH_DISABLE ++#define MIN_MULTIPATH_STATE ENFS_MULTIPATH_ENABLE ++ ++#define DEFAULT_PATH_DETECT_INTERVAL 10 ++#define DEFAULT_PATH_DETECT_TIMEOUT 5 ++#define DEFAULT_MULTIPATH_TIMEOUT 0 ++#define DEFAULT_MULTIPATH_STATE ENFS_MULTIPATH_ENABLE ++#define DEFAULT_LOADBALANCE_MODE ENFS_LOADBALANCE_RR ++ ++typedef int (*check_and_assign_func)(char *, char *, int, int); ++ ++struct enfs_config_info { ++ int32_t path_detect_interval; ++ int32_t path_detect_timeout; ++ int32_t multipath_timeout; ++ int32_t loadbalance_mode; ++ int32_t multipath_state; ++}; ++ ++struct check_and_assign_value { ++ char *field_name; ++ check_and_assign_func func; ++ int min_value; ++ int max_value; ++}; ++ ++static struct enfs_config_info g_enfs_config_info; ++static struct timespec64 modify_time; ++static struct task_struct *thread; ++ ++static int enfs_check_config_value(char *value, int min_value, int max_value) ++{ ++ unsigned long num_value; ++ int ret; ++ ++ ret = kstrtol(value, 10, &num_value); ++ if (ret != 0) { ++ enfs_log_error("Failed to convert string to int\n"); ++ return -EINVAL; ++ } ++ ++ if (num_value < min_value || num_value > max_value) ++ return -EINVAL; ++ ++ return num_value; ++} ++ ++static int32_t enfs_check_and_assign_int_value(char *field_name, char *value, ++ int min_value, int max_value) ++{ ++ int int_value = enfs_check_config_value(value, min_value, max_value); ++ ++ if (int_value < 0) ++ return -EINVAL; ++ ++ if (strcmp(field_name, "path_detect_interval") == 0) { ++ g_enfs_config_info.path_detect_interval = int_value; ++ return ENFS_RET_OK; ++ } ++ if (strcmp(field_name, "path_detect_timeout") == 0) { ++ g_enfs_config_info.path_detect_timeout = int_value; ++ return ENFS_RET_OK; ++ } ++ if (strcmp(field_name, "multipath_timeout") == 0) { ++ g_enfs_config_info.multipath_timeout = int_value; ++ return ENFS_RET_OK; ++ } ++ if (strcmp(field_name, "multipath_disable") == 0) { ++ g_enfs_config_info.multipath_state = int_value; ++ return ENFS_RET_OK; ++ } ++ return -EINVAL; ++} ++ ++static int32_t enfs_check_and_assign_loadbalance_mode(char *field_name, ++ char *value, ++ int min_value, ++ int max_value) ++{ ++ if (value == NULL) ++ return -EINVAL; ++ ++ if (strcmp(field_name, "multipath_select_policy") == 0) { ++ if (strcmp(value, "roundrobin") == 0) { ++ g_enfs_config_info.loadbalance_mode ++ = ENFS_LOADBALANCE_RR; ++ return ENFS_RET_OK; ++ } ++ } ++ return -EINVAL; ++} ++ ++static const struct check_and_assign_value g_check_and_assign_value[] = { ++ {"path_detect_interval", enfs_check_and_assign_int_value, ++ MIN_PATH_DETECT_INTERVAL, MAX_PATH_DETECT_INTERVAL}, ++ {"path_detect_timeout", enfs_check_and_assign_int_value, ++ MIN_PATH_DETECT_TIMEOUT, MAX_PATH_DETECT_TIMEOUT}, ++ {"multipath_timeout", enfs_check_and_assign_int_value, ++ MIN_MULTIPATH_TIMEOUT, MAX_MULTIPATH_TIMEOUT}, ++ {"multipath_disable", enfs_check_and_assign_int_value, ++ MIN_MULTIPATH_STATE, MAX_MULTIPATH_STATE}, ++ {"multipath_select_policy", enfs_check_and_assign_loadbalance_mode, ++ 0, 0}, ++}; ++ ++static int32_t enfs_read_config_file(char *buffer, char *file_path) ++{ ++ int ret; ++ struct file *filp = NULL; ++ loff_t f_pos = 0; ++ mm_segment_t fs; ++ ++ ++ filp = filp_open(file_path, O_RDONLY, 0); ++ ++ if (IS_ERR(filp)) { ++ enfs_log_error("Failed to open file %s\n", CONFIG_FILE_PATH); ++ ret = -ENOENT; ++ return ret; ++ } ++ ++ fs = get_fs(); ++ set_fs(get_ds()); ++ kernel_read(filp, buffer, MAX_FILE_SIZE, &f_pos); ++ set_fs(fs); ++ ++ ret = filp_close(filp, NULL); ++ if (ret) { ++ enfs_log_error("Close File:%s failed:%d.\n", ++ CONFIG_FILE_PATH, ret); ++ return -EINVAL; ++ } ++ return ENFS_RET_OK; ++} ++ ++static int32_t enfs_deal_with_comment_line(char *buffer) ++{ ++ int ret; ++ char *pos = strchr(buffer, '\n'); ++ ++ if (pos != NULL) ++ ret = strlen(buffer) - strlen(pos); ++ else ++ ret = strlen(buffer); ++ ++ return ret; ++} ++ ++static int32_t enfs_parse_key_value_from_config(char *buffer, char *key, ++ char *value, int keyLen, ++ int valueLen) ++{ ++ char *line; ++ char *tokenPtr; ++ int len; ++ char *tem; ++ char *pos = strchr(buffer, '\n'); ++ ++ if (pos != NULL) ++ len = strlen(buffer) - strlen(pos); ++ else ++ len = strlen(buffer); ++ ++ line = kmalloc(len + 1, GFP_KERNEL); ++ if (!line) { ++ enfs_log_error("Failed to allocate memory.\n"); ++ return -ENOMEM; ++ } ++ line[len] = '\0'; ++ strncpy(line, buffer, len); ++ ++ tem = line; ++ tokenPtr = strsep(&tem, "="); ++ if (tokenPtr == NULL || tem == NULL) { ++ kfree(line); ++ return len; ++ } ++ strncpy(key, strim(tokenPtr), keyLen); ++ strncpy(value, strim(tem), valueLen); ++ ++ kfree(line); ++ return len; ++} ++ ++static int32_t enfs_get_value_from_config_file(char *buffer, char *field_name, ++ char *value, int valueLen) ++{ ++ int ret; ++ char key[STRING_BUF_SIZE + 1] = {0}; ++ char val[STRING_BUF_SIZE + 1] = {0}; ++ ++ while (buffer[0] != '\0') { ++ if (buffer[0] == '\n') { ++ buffer++; ++ } else if (buffer[0] == '#') { ++ ret = enfs_deal_with_comment_line(buffer); ++ if (ret > 0) ++ buffer += ret; ++ } else { ++ ret = enfs_parse_key_value_from_config(buffer, key, val, ++ STRING_BUF_SIZE, ++ STRING_BUF_SIZE); ++ if (ret < 0) { ++ enfs_log_error("failed parse key value, %d\n" ++ , ret); ++ return ret; ++ } ++ key[STRING_BUF_SIZE] = '\0'; ++ val[STRING_BUF_SIZE] = '\0'; ++ ++ buffer += ret; ++ ++ if (strcmp(field_name, key) == 0) { ++ strncpy(value, val, valueLen); ++ return ENFS_RET_OK; ++ } ++ } ++ } ++ enfs_log_error("can not find value which matched field_name: %s.\n", ++ field_name); ++ return -EINVAL; ++} ++ ++int32_t enfs_config_load(void) ++{ ++ char value[STRING_BUF_SIZE + 1]; ++ int ret; ++ int table_len; ++ int min; ++ int max; ++ int i; ++ char *buffer; ++ ++ buffer = kmalloc(MAX_FILE_SIZE, GFP_KERNEL); ++ if (!buffer) { ++ enfs_log_error("Failed to allocate memory.\n"); ++ return -ENOMEM; ++ } ++ memset(buffer, 0, MAX_FILE_SIZE); ++ ++ g_enfs_config_info.path_detect_interval = DEFAULT_PATH_DETECT_INTERVAL; ++ g_enfs_config_info.path_detect_timeout = DEFAULT_PATH_DETECT_TIMEOUT; ++ g_enfs_config_info.multipath_timeout = DEFAULT_MULTIPATH_TIMEOUT; ++ g_enfs_config_info.multipath_state = DEFAULT_MULTIPATH_STATE; ++ g_enfs_config_info.loadbalance_mode = DEFAULT_LOADBALANCE_MODE; ++ ++ table_len = sizeof(g_check_and_assign_value) / ++ sizeof(g_check_and_assign_value[0]); ++ ++ ret = enfs_read_config_file(buffer, CONFIG_FILE_PATH); ++ if (ret != 0) { ++ kfree(buffer); ++ return ret; ++ } ++ ++ for (i = 0; i < table_len; i++) { ++ ret = enfs_get_value_from_config_file(buffer, ++ g_check_and_assign_value[i].field_name, ++ value, STRING_BUF_SIZE); ++ if (ret < 0) ++ continue; ++ ++ value[STRING_BUF_SIZE] = '\0'; ++ min = g_check_and_assign_value[i].min_value; ++ max = g_check_and_assign_value[i].max_value; ++ if (g_check_and_assign_value[i].func != NULL) ++ (*g_check_and_assign_value[i].func)( ++ g_check_and_assign_value[i].field_name, ++ value, min, max); ++ } ++ ++ kfree(buffer); ++ return ENFS_RET_OK; ++} ++ ++int32_t enfs_get_config_path_detect_interval(void) ++{ ++ return g_enfs_config_info.path_detect_interval; ++} ++ ++int32_t enfs_get_config_path_detect_timeout(void) ++{ ++ return g_enfs_config_info.path_detect_timeout; ++} ++ ++int32_t enfs_get_config_multipath_timeout(void) ++{ ++ return g_enfs_config_info.multipath_timeout; ++} ++ ++int32_t enfs_get_config_multipath_state(void) ++{ ++ return g_enfs_config_info.multipath_state; ++} ++ ++int32_t enfs_get_config_loadbalance_mode(void) ++{ ++ return g_enfs_config_info.loadbalance_mode; ++} ++ ++static bool enfs_file_changed(const char *filename) ++{ ++ int err; ++ struct kstat file_stat; ++ ++ err = vfs_stat(filename, &file_stat); ++ if (err) { ++ pr_err("failed to open file:%s err:%d\n", filename, err); ++ return false; ++ } ++ ++ if (timespec64_compare(&modify_time, &file_stat.mtime) == -1) { ++ modify_time = file_stat.mtime; ++ pr_info("file change: %lld %lld\n", modify_time.tv_sec, ++ file_stat.mtime.tv_sec); ++ return true; ++ } ++ ++ return false; ++} ++ ++static int enfs_thread_func(void *data) ++{ ++ while (!kthread_should_stop()) { ++ if (enfs_file_changed(CONFIG_FILE_PATH)) ++ enfs_config_load(); ++ ++ msleep(ENFS_NOTIFY_FILE_PERIOD); ++ } ++ return 0; ++} ++ ++int enfs_config_timer_init(void) ++{ ++ thread = kthread_run(enfs_thread_func, NULL, "enfs_notiy_file_thread"); ++ if (IS_ERR(thread)) { ++ pr_err("Failed to create kernel thread\n"); ++ return PTR_ERR(thread); ++ } ++ return 0; ++} ++ ++void enfs_config_timer_exit(void) ++{ ++ pr_info("enfs_notify_file_exit\n"); ++ if (thread) ++ kthread_stop(thread); ++} +diff --git a/fs/nfs/enfs/enfs_config.h b/fs/nfs/enfs/enfs_config.h +new file mode 100644 +index 000000000000..752710129170 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_config.h +@@ -0,0 +1,32 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs configuration ++ * Author: y00583252 ++ * Create: 2023-07-27 ++ */ ++ ++#ifndef ENFS_CONFIG_H ++#define ENFS_CONFIG_H ++ ++#include <linux/types.h> ++ ++enum enfs_multipath_state { ++ ENFS_MULTIPATH_ENABLE = 0, ++ ENFS_MULTIPATH_DISABLE = 1, ++}; ++ ++enum enfs_loadbalance_mode { ++ ENFS_LOADBALANCE_RR, ++}; ++ ++ ++int32_t enfs_get_config_path_detect_interval(void); ++int32_t enfs_get_config_path_detect_timeout(void); ++int32_t enfs_get_config_multipath_timeout(void); ++int32_t enfs_get_config_multipath_state(void); ++int32_t enfs_get_config_loadbalance_mode(void); ++int32_t enfs_config_load(void); ++int32_t enfs_config_timer_init(void); ++void enfs_config_timer_exit(void); ++#endif // ENFS_CONFIG_H +diff --git a/fs/nfs/enfs/enfs_errcode.h b/fs/nfs/enfs/enfs_errcode.h +new file mode 100644 +index 000000000000..cca47ab9a191 +--- /dev/null ++++ b/fs/nfs/enfs/enfs_errcode.h +@@ -0,0 +1,17 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs errocode ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++ ++#ifndef ENFS_ERRCODE_H ++#define ENFS_ERRCODE_H ++ ++enum { ++ ENFS_RET_OK = 0, ++ ENFS_RET_FAIL ++}; ++ ++#endif // ENFS_ERRCODE_H +diff --git a/fs/nfs/enfs/enfs_log.h b/fs/nfs/enfs/enfs_log.h +new file mode 100644 +index 000000000000..177b404f05df +--- /dev/null ++++ b/fs/nfs/enfs/enfs_log.h +@@ -0,0 +1,25 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: enfs log ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++#ifndef ENFS_LOG_H ++#define ENFS_LOG_H ++ ++#include <linux/printk.h> ++ ++#define enfs_log_info(fmt, ...) \ ++ pr_info("enfs:[%s]" pr_fmt(fmt), \ ++ __func__, ##__VA_ARGS__) ++ ++#define enfs_log_error(fmt, ...) \ ++ pr_err("enfs:[%s]" pr_fmt(fmt), \ ++ __func__, ##__VA_ARGS__) ++ ++#define enfs_log_debug(fmt, ...) \ ++ pr_debug("enfs:[%s]" pr_fmt(fmt), \ ++ __func__, ##__VA_ARGS__) ++ ++#endif // ENFS_ERRCODE_H +diff --git a/fs/nfs/enfs/failover_com.h b/fs/nfs/enfs/failover_com.h +new file mode 100644 +index 000000000000..c52940da232e +--- /dev/null ++++ b/fs/nfs/enfs/failover_com.h +@@ -0,0 +1,23 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: failover time commont header file ++ * Create: 2023-08-02 ++ */ ++#ifndef FAILOVER_COMMON_H ++#define FAILOVER_COMMON_H ++ ++static inline bool failover_is_enfs_clnt(struct rpc_clnt *clnt) ++{ ++ struct rpc_clnt *next = clnt->cl_parent; ++ ++ while (next) { ++ if (next == next->cl_parent) ++ break; ++ next = next->cl_parent; ++ } ++ ++ return next != NULL ? next->cl_enfs : clnt->cl_enfs; ++} ++ ++#endif // FAILOVER_COMMON_H +diff --git a/fs/nfs/enfs/failover_path.c b/fs/nfs/enfs/failover_path.c +new file mode 100644 +index 000000000000..93b454de29d1 +--- /dev/null ++++ b/fs/nfs/enfs/failover_path.c +@@ -0,0 +1,207 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs path failover file ++ * Author: y00583252 ++ * Create: 2023-08-02 ++ */ ++ ++#include "failover_path.h" ++#include <linux/nfs.h> ++#include <linux/nfs3.h> ++#include <linux/nfs4.h> ++#include <linux/sunrpc/clnt.h> ++#include <linux/sunrpc/sched.h> ++#include <linux/sunrpc/xprt.h> ++#include "enfs_config.h" ++#include "enfs_log.h" ++#include "failover_com.h" ++#include "pm_state.h" ++#include "pm_ping.h" ++ ++enum failover_policy_t { ++ FAILOVER_NOACTION = 1, ++ FAILOVER_RETRY, ++ FAILOVER_RETRY_DELAY, ++}; ++ ++static void failover_retry_path(struct rpc_task *task) ++{ ++ xprt_release(task); ++ rpc_init_task_retry_counters(task); ++ rpc_task_release_transport(task); ++ rpc_restart_call(task); ++} ++ ++static void failover_retry_path_delay(struct rpc_task *task, int32_t delay) ++{ ++ failover_retry_path(task); ++ rpc_delay(task, delay); ++} ++ ++static void failover_retry_path_by_policy(struct rpc_task *task, ++ enum failover_policy_t policy) ++{ ++ if (policy == FAILOVER_RETRY) ++ failover_retry_path(task); ++ else if (policy == FAILOVER_RETRY_DELAY) ++ failover_retry_path_delay(task, 3 * HZ); // delay 3s ++} ++ ++static ++enum failover_policy_t failover_get_nfs3_retry_policy(struct rpc_task *task) ++{ ++ enum failover_policy_t policy = FAILOVER_NOACTION; ++ const struct rpc_procinfo *procinfo = task->tk_msg.rpc_proc; ++ u32 proc; ++ ++ if (unlikely(procinfo == NULL)) { ++ enfs_log_error("the task contains no valid proc.\n"); ++ return FAILOVER_NOACTION; ++ } ++ ++ proc = procinfo->p_proc; ++ ++ switch (proc) { ++ case NFS3PROC_CREATE: ++ case NFS3PROC_MKDIR: ++ case NFS3PROC_REMOVE: ++ case NFS3PROC_RMDIR: ++ case NFS3PROC_SYMLINK: ++ case NFS3PROC_LINK: ++ case NFS3PROC_SETATTR: ++ case NFS3PROC_WRITE: ++ policy = FAILOVER_RETRY_DELAY; ++ default: ++ policy = FAILOVER_RETRY; ++ } ++ return policy; ++} ++ ++static ++enum failover_policy_t failover_get_nfs4_retry_policy(struct rpc_task *task) ++{ ++ enum failover_policy_t policy = FAILOVER_NOACTION; ++ const struct rpc_procinfo *procinfo = task->tk_msg.rpc_proc; ++ u32 proc_idx; ++ ++ if (unlikely(procinfo == NULL)) { ++ enfs_log_error("the task contains no valid proc.\n"); ++ return FAILOVER_NOACTION; ++ } ++ ++ proc_idx = procinfo->p_statidx; ++ ++ switch (proc_idx) { ++ case NFSPROC4_CLNT_CREATE: ++ case NFSPROC4_CLNT_REMOVE: ++ case NFSPROC4_CLNT_LINK: ++ case NFSPROC4_CLNT_SYMLINK: ++ case NFSPROC4_CLNT_SETATTR: ++ case NFSPROC4_CLNT_WRITE: ++ case NFSPROC4_CLNT_RENAME: ++ case NFSPROC4_CLNT_SETACL: ++ policy = FAILOVER_RETRY_DELAY; ++ default: ++ policy = FAILOVER_RETRY; ++ } ++ return policy; ++} ++ ++static enum failover_policy_t failover_get_retry_policy(struct rpc_task *task) ++{ ++ struct rpc_clnt *clnt = task->tk_client; ++ u32 version = clnt->cl_vers; ++ enum failover_policy_t policy = FAILOVER_NOACTION; ++ ++ // 1. if the task meant to send to certain xprt, take no action ++ if (task->tk_flags & RPC_TASK_FIXED) ++ return FAILOVER_NOACTION; ++ ++ // 2. get policy by different version of nfs protocal ++ if (version == 3) // nfs v3 ++ policy = failover_get_nfs3_retry_policy(task); ++ else if (version == 4) // nfs v4 ++ policy = failover_get_nfs4_retry_policy(task); ++ else ++ return FAILOVER_NOACTION; ++ ++ // 3. if the task is not send to target, retry immediately ++ if (!RPC_WAS_SENT(task)) ++ policy = FAILOVER_RETRY; ++ ++ return policy; ++} ++ ++static int failover_check_task(struct rpc_task *task) ++{ ++ struct rpc_clnt *clnt = NULL; ++ int disable_mpath = enfs_get_config_multipath_state(); ++ ++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { ++ enfs_log_debug("Multipath is not enabled.\n"); ++ return -EINVAL; ++ } ++ ++ if (unlikely((task == NULL) || (task->tk_client == NULL))) { ++ enfs_log_error("The task is not valid.\n"); ++ return -EINVAL; ++ } ++ ++ clnt = task->tk_client; ++ ++ if (clnt->cl_prog != NFS_PROGRAM) { ++ enfs_log_debug("The clnt is not prog{%u} type.\n", ++ clnt->cl_prog); ++ return -EINVAL; ++ } ++ ++ if (!failover_is_enfs_clnt(clnt)) { ++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); ++ return -EINVAL; ++ } ++ return 0; ++} ++ ++void failover_handle(struct rpc_task *task) ++{ ++ enum failover_policy_t policy; ++ int ret; ++ ++ ret = failover_check_task(task); ++ if (ret != 0) ++ return; ++ ++ pm_set_path_state(task->tk_xprt, PM_STATE_FAULT); ++ ++ policy = failover_get_retry_policy(task); ++ ++ failover_retry_path_by_policy(task, policy); ++} ++ ++bool failover_task_need_call_start_again(struct rpc_task *task) ++{ ++ int ret; ++ ++ ret = failover_check_task(task); ++ if (ret != 0) ++ return false; ++ ++ return true; ++} ++ ++bool failover_prepare_transmit(struct rpc_task *task) ++{ ++ if (task->tk_flags & RPC_TASK_FIXED) ++ return true; ++ ++ if (pm_ping_is_test_xprt_task(task)) ++ return true; ++ ++ if (pm_get_path_state(task->tk_xprt) == PM_STATE_FAULT) { ++ task->tk_status = -ETIMEDOUT; ++ return false; ++ } ++ ++ return true; ++} +diff --git a/fs/nfs/enfs/failover_path.h b/fs/nfs/enfs/failover_path.h +new file mode 100644 +index 000000000000..6f1294829a6e +--- /dev/null ++++ b/fs/nfs/enfs/failover_path.h +@@ -0,0 +1,17 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs path failover header file ++ * Author: y00583252 ++ * Create: 2023-08-02 ++ */ ++ ++#ifndef FAILOVER_PATH_H ++#define FAILOVER_PATH_H ++ ++#include <linux/sunrpc/sched.h> ++ ++void failover_handle(struct rpc_task *task); ++bool failover_prepare_transmit(struct rpc_task *task); ++ ++#endif // FAILOVER_PATH_H +diff --git a/fs/nfs/enfs/failover_time.c b/fs/nfs/enfs/failover_time.c +new file mode 100644 +index 000000000000..866ea82d13fc +--- /dev/null ++++ b/fs/nfs/enfs/failover_time.c +@@ -0,0 +1,99 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: failover time file ++ * Create: 2023-08-02 ++ */ ++ ++#include "failover_time.h" ++#include <linux/jiffies.h> ++#include <linux/sunrpc/clnt.h> ++#include "enfs_config.h" ++#include "enfs_log.h" ++#include "failover_com.h" ++#include "pm_ping.h" ++ ++static unsigned long failover_get_mulitipath_timeout(struct rpc_clnt *clnt) ++{ ++ unsigned long config_tmo = enfs_get_config_multipath_timeout() * HZ; ++ unsigned long clnt_tmo = clnt->cl_timeout->to_initval; ++ ++ if (config_tmo == 0) ++ return clnt_tmo; ++ ++ return config_tmo > clnt_tmo ? clnt_tmo : config_tmo; ++} ++ ++void failover_adjust_task_timeout(struct rpc_task *task, void *condition) ++{ ++ struct rpc_clnt *clnt = NULL; ++ unsigned long tmo; ++ int disable_mpath = enfs_get_config_multipath_state(); ++ ++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { ++ enfs_log_debug("Multipath is not enabled.\n"); ++ return; ++ } ++ ++ clnt = task->tk_client; ++ if (unlikely(clnt == NULL)) { ++ enfs_log_error("task associate client is NULL.\n"); ++ return; ++ } ++ ++ if (!failover_is_enfs_clnt(clnt)) { ++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); ++ return; ++ } ++ ++ tmo = failover_get_mulitipath_timeout(clnt); ++ if (tmo == 0) { ++ enfs_log_debug("Multipath is not enabled.\n"); ++ return; ++ } ++ ++ if (task->tk_timeout != 0) ++ task->tk_timeout = ++ task->tk_timeout < tmo ? task->tk_timeout : tmo; ++ else ++ task->tk_timeout = tmo; ++} ++ ++void failover_init_task_req(struct rpc_task *task, struct rpc_rqst *req) ++{ ++ struct rpc_clnt *clnt = NULL; ++ int disable_mpath = enfs_get_config_multipath_state(); ++ ++ if (disable_mpath != ENFS_MULTIPATH_ENABLE) { ++ enfs_log_debug("Multipath is not enabled.\n"); ++ return; ++ } ++ ++ clnt = task->tk_client; ++ if (unlikely(clnt == NULL)) { ++ enfs_log_error("task associate client is NULL.\n"); ++ return; ++ } ++ ++ if (!failover_is_enfs_clnt(clnt)) { ++ enfs_log_debug("The clnt is not a enfs-managed type.\n"); ++ return; ++ } ++ ++ if (!pm_ping_is_test_xprt_task(task)) ++ req->rq_timeout = failover_get_mulitipath_timeout(clnt); ++ else { ++ req->rq_timeout = enfs_get_config_path_detect_timeout() * HZ; ++ req->rq_majortimeo = req->rq_timeout + jiffies; ++ } ++ ++ /* ++ * when task is retried, the req is new, we lost major-timeout times, ++ * so we have to restore req major ++ * timeouts from the task, if it is stored. ++ */ ++ if (task->tk_major_timeo != 0) ++ req->rq_majortimeo = task->tk_major_timeo; ++ else ++ task->tk_major_timeo = req->rq_majortimeo; ++} +diff --git a/fs/nfs/enfs/failover_time.h b/fs/nfs/enfs/failover_time.h +new file mode 100644 +index 000000000000..ede25b577a2a +--- /dev/null ++++ b/fs/nfs/enfs/failover_time.h +@@ -0,0 +1,16 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: failover time header file ++ * Create: 2023-08-02 ++ */ ++ ++#ifndef FAILOVER_TIME_H ++#define FAILOVER_TIME_H ++ ++#include <linux/sunrpc/sched.h> ++ ++void failover_adjust_task_timeout(struct rpc_task *task, void *condition); ++void failover_init_task_req(struct rpc_task *task, struct rpc_rqst *req); ++ ++#endif // FAILOVER_TIME_H +diff --git a/fs/nfs/enfs/init.h b/fs/nfs/enfs/init.h +new file mode 100644 +index 000000000000..fdabb9084e19 +--- /dev/null ++++ b/fs/nfs/enfs/init.h +@@ -0,0 +1,17 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs client init ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++ ++#ifndef ENFS_INIT_H ++#define ENFS_INIT_H ++ ++#include <linux/types.h> ++ ++int32_t enfs_init(void); ++void enfs_fini(void); ++ ++#endif +diff --git a/fs/nfs/enfs/mgmt_init.c b/fs/nfs/enfs/mgmt_init.c +new file mode 100644 +index 000000000000..75a40c5e0f6c +--- /dev/null ++++ b/fs/nfs/enfs/mgmt_init.c +@@ -0,0 +1,22 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: mgmt component init ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++ ++#include "mgmt_init.h" ++#include <linux/printk.h> ++#include "enfs_errcode.h" ++#include "enfs_config.h" ++ ++int32_t mgmt_init(void) ++{ ++ return enfs_config_timer_init(); ++} ++ ++void mgmt_fini(void) ++{ ++ enfs_config_timer_exit(); ++} +diff --git a/fs/nfs/enfs/mgmt_init.h b/fs/nfs/enfs/mgmt_init.h +new file mode 100644 +index 000000000000..aa78303b9f01 +--- /dev/null ++++ b/fs/nfs/enfs/mgmt_init.h +@@ -0,0 +1,18 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: mgmt component init ++ * Author: y00583252 ++ * Create: 2023-07-31 ++ */ ++ ++#ifndef MGMT_INIT_H ++#define MGMT_INIT_H ++ ++#include <linux/types.h> ++ ++int32_t mgmt_init(void); ++void mgmt_fini(void); ++ ++ ++#endif // MGMT_INIT_H +diff --git a/fs/nfs/enfs/pm_ping.c b/fs/nfs/enfs/pm_ping.c +new file mode 100644 +index 000000000000..24153cd4c7f3 +--- /dev/null ++++ b/fs/nfs/enfs/pm_ping.c +@@ -0,0 +1,421 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: path state header file ++ * Author: x00833432 ++ * Create: 2023-08-21 ++ */ ++ ++#include "pm_ping.h" ++#include <linux/err.h> ++#include <linux/spinlock.h> ++#include <linux/slab.h> ++#include <linux/module.h> ++#include <linux/printk.h> ++#include <linux/kthread.h> ++#include <linux/nfs.h> ++#include <linux/errno.h> ++#include <linux/rcupdate.h> ++#include <linux/workqueue.h> ++#include <net/netns/generic.h> ++#include <linux/atomic.h> ++#include <linux/sunrpc/clnt.h> ++ ++#include "../../../net/sunrpc/netns.h" ++#include "pm_state.h" ++#include "enfs.h" ++#include "enfs_log.h" ++#include "enfs_config.h" ++ ++#define SLEEP_INTERVAL 2 ++extern unsigned int sunrpc_net_id; ++ ++static struct task_struct *pm_ping_timer_thread; ++//protect pint_execute_workq ++static spinlock_t ping_execute_workq_lock; ++// timer for test xprt workqueue ++static struct workqueue_struct *ping_execute_workq; ++// count the ping xprt work on flight ++static atomic_t check_xprt_count; ++ ++struct ping_xprt_work { ++ struct rpc_xprt *xprt; // use this specific xprt ++ struct rpc_clnt *clnt; // use this specific rpc_client ++ struct work_struct ping_work; ++}; ++ ++struct pm_ping_async_callback { ++ void *data; ++ void (*func)(void *data); ++}; ++ ++// set xprt's enum pm_check_state ++void pm_ping_set_path_check_state(struct rpc_xprt *xprt, ++ enum pm_check_state state) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ ++ if (IS_ERR(xprt)) { ++ enfs_log_error("The xprt ptr is not exist.\n"); ++ return; ++ } ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return; ++ } ++ ++ xprt_get(xprt); ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (ctx == NULL) { ++ enfs_log_error("The xprt multipath ctx is not valid.\n"); ++ xprt_put(xprt); ++ return; ++ } ++ ++ atomic_set(&ctx->path_check_state, state); ++ xprt_put(xprt); ++} ++ ++// get xprt's enum pm_check_state ++static enum pm_check_state pm_ping_get_path_check_state(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ enum pm_check_state state; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return PM_CHECK_UNDEFINE; ++ } ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (ctx == NULL) { ++ enfs_log_error("The xprt multipath ctx is not valid.\n"); ++ return PM_CHECK_UNDEFINE; ++ } ++ ++ state = atomic_read(&ctx->path_check_state); ++ ++ return state; ++} ++ ++static void pm_ping_call_done_callback(void *data) ++{ ++ struct pm_ping_async_callback *callback_data = ++ (struct pm_ping_async_callback *)data; ++ ++ if (callback_data == NULL) ++ return; ++ ++ callback_data->func(callback_data->data); ++ ++ kfree(callback_data); ++} ++ ++// Default callback for async RPC calls ++static void pm_ping_call_done(struct rpc_task *task, void *data) ++{ ++ struct rpc_xprt *xprt = task->tk_xprt; ++ ++ atomic_dec(&check_xprt_count); ++ if (task->tk_status >= 0) ++ pm_set_path_state(xprt, PM_STATE_NORMAL); ++ else ++ pm_set_path_state(xprt, PM_STATE_FAULT); ++ ++ pm_ping_set_path_check_state(xprt, PM_CHECK_FINISH); ++ ++ pm_ping_call_done_callback(data); ++} ++ ++// register func to rpc_call_done ++static const struct rpc_call_ops pm_ping_set_status_ops = { ++ .rpc_call_done = pm_ping_call_done, ++}; ++ ++// execute work which in work_queue ++static void pm_ping_execute_work(struct work_struct *work) ++{ ++ int ret = 0; ++ ++ // get the work information ++ struct ping_xprt_work *work_info = ++ container_of(work, struct ping_xprt_work, ping_work); ++ ++ // if check state is pending ++ if (pm_ping_get_path_check_state(work_info->xprt) == PM_CHECK_WAITING) { ++ ++ pm_ping_set_path_check_state(work_info->xprt, ++ PM_CHECK_CHECKING); ++ ++ ret = rpc_clnt_test_xprt(work_info->clnt, ++ work_info->xprt, ++ &pm_ping_set_status_ops, ++ NULL, ++ RPC_TASK_ASYNC | RPC_TASK_FIXED); ++ ++ if (ret < 0) { ++ enfs_log_debug("ping xprt execute failed ,ret %d", ret); ++ ++ pm_ping_set_path_check_state(work_info->xprt, ++ PM_CHECK_FINISH); ++ ++ } else ++ atomic_inc(&check_xprt_count); ++ ++ } ++ ++ atomic_dec(&work_info->clnt->cl_count); ++ xprt_put(work_info->xprt); ++ kfree(work_info); ++ work_info = NULL; ++} ++ ++static bool pm_ping_workqueue_queue_work(struct work_struct *work) ++{ ++ bool ret = false; ++ ++ spin_lock(&ping_execute_workq_lock); ++ ++ if (ping_execute_workq != NULL) ++ ret = queue_work(ping_execute_workq, work); ++ ++ spin_unlock(&ping_execute_workq_lock); ++ return ret; ++} ++ ++// init test work and add this work to workqueue ++static int pm_ping_add_work(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, void *data) ++{ ++ struct ping_xprt_work *work_info; ++ bool ret = false; ++ ++ if (IS_ERR(xprt) || xprt == NULL) { ++ enfs_log_error("The xprt ptr is not exist.\n"); ++ return -EINVAL; ++ } ++ ++ if (IS_ERR(clnt) || clnt == NULL) { ++ enfs_log_error("The clnt ptr is not exist.\n"); ++ return -EINVAL; ++ } ++ ++ if (!xprt->multipath_context) { ++ enfs_log_error("multipath_context is null.\n"); ++ return -EINVAL; ++ } ++ ++ // check xprt pending status, if pending status equals Finish ++ // means this xprt can inster to work queue ++ if (pm_ping_get_path_check_state(xprt) == ++ PM_CHECK_FINISH || ++ pm_ping_get_path_check_state(xprt) == ++ PM_CHECK_INIT) { ++ ++ enfs_log_debug("find xprt pointer. %p\n", xprt); ++ work_info = kzalloc(sizeof(struct ping_xprt_work), GFP_ATOMIC); ++ if (work_info == NULL) ++ return -ENOMEM; ++ work_info->clnt = clnt; ++ atomic_inc(&clnt->cl_count); ++ work_info->xprt = xprt; ++ xprt_get(xprt); ++ INIT_WORK(&work_info->ping_work, pm_ping_execute_work); ++ pm_ping_set_path_check_state(xprt, PM_CHECK_WAITING); ++ ++ ret = pm_ping_workqueue_queue_work(&work_info->ping_work); ++ if (!ret) { ++ atomic_dec(&work_info->clnt->cl_count); ++ xprt_put(work_info->xprt); ++ kfree(work_info); ++ return -EINVAL; ++ } ++ } ++ return 0; ++} ++ ++// encapsulate pm_ping_add_work() ++static int pm_ping_execute_xprt_test(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, void *data) ++{ ++ pm_ping_add_work(clnt, xprt, NULL); ++ // return 0 for rpc_clnt_iterate_for_each_xprt(); ++ // because negative value will stop iterate all xprt ++ // and we need return negative value for debug ++ // Therefore, we need this function to iterate all xprt ++ return 0; ++} ++ ++// export to other module add ping work to workqueue ++int pm_ping_rpc_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt) ++{ ++ int ret; ++ ++ ret = pm_ping_add_work(clnt, xprt, NULL); ++ return ret; ++} ++ ++// iterate xprt in the client ++static void pm_ping_loop_rpclnt(struct sunrpc_net *sn) ++{ ++ struct rpc_clnt *clnt; ++ ++ spin_lock(&sn->rpc_client_lock); ++ list_for_each_entry_rcu(clnt, &sn->all_clients, cl_clients) { ++ if (clnt->cl_enfs) { ++ enfs_log_debug("find rpc_clnt. %p\n", clnt); ++ rpc_clnt_iterate_for_each_xprt(clnt, ++ pm_ping_execute_xprt_test, NULL); ++ } ++ } ++ spin_unlock(&sn->rpc_client_lock); ++} ++ ++// iterate each clnt in the sunrpc_net ++static void pm_ping_loop_sunrpc_net(void) ++{ ++ struct net *net; ++ struct sunrpc_net *sn; ++ ++ rcu_read_lock(); ++ for_each_net_rcu(net) { ++ sn = net_generic(net, sunrpc_net_id); ++ if (sn == NULL) ++ continue; ++ pm_ping_loop_rpclnt(sn); ++ } ++ rcu_read_unlock(); ++} ++ ++static int pm_ping_routine(void *data) ++{ ++ while (!kthread_should_stop()) { ++ // equale 0 means open multipath ++ if (enfs_get_config_multipath_state() == ++ ENFS_MULTIPATH_ENABLE) ++ pm_ping_loop_sunrpc_net(); ++ ++ msleep((unsigned int) ++ enfs_get_config_path_detect_interval() * 1000); ++ } ++ return 0; ++} ++ ++// start thread to cycly ping ++static int pm_ping_start(void) ++{ ++ pm_ping_timer_thread = ++ kthread_run(pm_ping_routine, NULL, "pm_ping_routine"); ++ if (IS_ERR(pm_ping_timer_thread)) { ++ enfs_log_error("Failed to create kernel thread\n"); ++ return PTR_ERR(pm_ping_timer_thread); ++ } ++ return 0; ++} ++ ++// initialize workqueue ++static int pm_ping_workqueue_init(void) ++{ ++ struct workqueue_struct *queue = NULL; ++ ++ queue = create_workqueue("pm_ping_workqueue"); ++ ++ if (queue == NULL) { ++ enfs_log_error("create workqueue failed.\n"); ++ return -ENOMEM; ++ } ++ ++ spin_lock(&ping_execute_workq_lock); ++ ping_execute_workq = queue; ++ spin_unlock(&ping_execute_workq_lock); ++ enfs_log_info("create workqueue succeeeded.\n"); ++ return 0; ++} ++ ++static void pm_ping_workqueue_fini(void) ++{ ++ struct workqueue_struct *queue = NULL; ++ ++ spin_lock(&ping_execute_workq_lock); ++ queue = ping_execute_workq; ++ ping_execute_workq = NULL; ++ spin_unlock(&ping_execute_workq_lock); ++ ++ enfs_log_info("delete work queue\n"); ++ ++ if (queue != NULL) { ++ flush_workqueue(queue); ++ destroy_workqueue(queue); ++ } ++} ++ ++// module exit func ++void pm_ping_fini(void) ++{ ++ if (pm_ping_timer_thread) ++ kthread_stop(pm_ping_timer_thread); ++ ++ pm_ping_workqueue_fini(); ++ ++ while (atomic_read(&check_xprt_count) != 0) ++ msleep(SLEEP_INTERVAL); ++} ++ ++// module init func ++int pm_ping_init(void) ++{ ++ int ret; ++ ++ atomic_set(&check_xprt_count, 0); ++ ret = pm_ping_workqueue_init(); ++ if (ret != 0) { ++ enfs_log_error("PM_PING Module loading failed.\n"); ++ return ret; ++ } ++ ret = pm_ping_start(); ++ if (ret != 0) { ++ enfs_log_error("PM_PING Module loading failed.\n"); ++ pm_ping_workqueue_fini(); ++ return ret; ++ } ++ ++ return ret; ++} ++ ++bool pm_ping_is_test_xprt_task(struct rpc_task *task) ++{ ++ return task->tk_ops == &pm_ping_set_status_ops ? true : false; ++} ++ ++int pm_ping_rpc_test_xprt_with_callback(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void (*func)(void *data), ++ void *data) ++{ ++ int ret; ++ ++ struct pm_ping_async_callback *callback_data = ++ kzalloc(sizeof(struct pm_ping_async_callback), GFP_KERNEL); ++ ++ if (callback_data == NULL) { ++ enfs_log_error("failed to mzalloc mem\n"); ++ return -ENOMEM; ++ } ++ ++ callback_data->data = data; ++ callback_data->func = func; ++ atomic_inc(&check_xprt_count); ++ ret = rpc_clnt_test_xprt(clnt, xprt, ++ &pm_ping_set_status_ops, ++ callback_data, ++ RPC_TASK_ASYNC | RPC_TASK_FIXED); ++ ++ if (ret < 0) { ++ enfs_log_debug("ping xprt execute failed ,ret %d", ret); ++ atomic_dec(&check_xprt_count); ++ } ++ ++ return ret; ++} +diff --git a/fs/nfs/enfs/pm_ping.h b/fs/nfs/enfs/pm_ping.h +new file mode 100644 +index 000000000000..6bcb94bfc836 +--- /dev/null ++++ b/fs/nfs/enfs/pm_ping.h +@@ -0,0 +1,33 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: nfs configuration ++ * Author: x00833432 ++ * Create: 2023-07-27 ++ */ ++ ++#ifndef PM_PING_H ++#define PM_PING_H ++ ++#include <linux/sunrpc/clnt.h> ++ ++enum pm_check_state { ++ PM_CHECK_INIT, // this xprt never been queued ++ PM_CHECK_WAITING, // this xprt waiting in the queue ++ PM_CHECK_CHECKING, // this xprt is testing ++ PM_CHECK_FINISH, // this xprt has been finished ++ PM_CHECK_UNDEFINE, // undefine multipath struct ++}; ++ ++int pm_ping_init(void); ++void pm_ping_fini(void); ++int pm_ping_rpc_test_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt); ++void pm_ping_set_path_check_state(struct rpc_xprt *xprt, ++ enum pm_check_state state); ++bool pm_ping_is_test_xprt_task(struct rpc_task *task); ++int pm_ping_rpc_test_xprt_with_callback(struct rpc_clnt *clnt, ++ struct rpc_xprt *xprt, ++ void (*func)(void *data), ++ void *data); ++ ++#endif // PM_PING_H +diff --git a/fs/nfs/enfs/pm_state.c b/fs/nfs/enfs/pm_state.c +new file mode 100644 +index 000000000000..220621a207a2 +--- /dev/null ++++ b/fs/nfs/enfs/pm_state.c +@@ -0,0 +1,158 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: path state file ++ * Author: y00583252 ++ * Create: 2023-08-12 ++ */ ++#include "pm_state.h" ++#include <linux/sunrpc/xprt.h> ++ ++#include "enfs.h" ++#include "enfs_log.h" ++ ++enum pm_path_state pm_get_path_state(struct rpc_xprt *xprt) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ enum pm_path_state state; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return PM_STATE_UNDEFINED; ++ } ++ ++ xprt_get(xprt); ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (ctx == NULL) { ++ enfs_log_error("The xprt multipath ctx is not valid.\n"); ++ xprt_put(xprt); ++ return PM_STATE_UNDEFINED; ++ } ++ ++ state = atomic_read(&ctx->path_state); ++ ++ xprt_put(xprt); ++ ++ return state; ++} ++ ++void pm_set_path_state(struct rpc_xprt *xprt, enum pm_path_state state) ++{ ++ struct enfs_xprt_context *ctx = NULL; ++ enum pm_path_state cur_state; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return; ++ } ++ ++ xprt_get(xprt); ++ ++ ctx = (struct enfs_xprt_context *)xprt->multipath_context; ++ if (ctx == NULL) { ++ enfs_log_error("The xprt multipath ctx is not valid.\n"); ++ xprt_put(xprt); ++ return; ++ } ++ ++ cur_state = atomic_read(&ctx->path_state); ++ if (cur_state == state) { ++ enfs_log_debug("The xprt is already {%d}.\n", state); ++ xprt_put(xprt); ++ return; ++ } ++ ++ atomic_set(&ctx->path_state, state); ++ enfs_log_info("The xprt {%p} path state change from {%d} to {%d}.\n", ++ xprt, cur_state, state); ++ ++ xprt_put(xprt); ++} ++ ++void pm_get_path_state_desc(struct rpc_xprt *xprt, char *buf, int len) ++{ ++ enum pm_path_state state; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return; ++ } ++ ++ if ((buf == NULL) || (len <= 0)) { ++ enfs_log_error("Buffer is not valid, len=%d.\n", len); ++ return; ++ } ++ ++ state = pm_get_path_state(xprt); ++ ++ switch (state) { ++ case PM_STATE_INIT: ++ (void)snprintf(buf, len, "Init"); ++ break; ++ case PM_STATE_NORMAL: ++ (void)snprintf(buf, len, "Normal"); ++ break; ++ case PM_STATE_FAULT: ++ (void)snprintf(buf, len, "Fault"); ++ break; ++ default: ++ (void)snprintf(buf, len, "Unknown"); ++ break; ++ } ++} ++ ++void pm_get_xprt_state_desc(struct rpc_xprt *xprt, char *buf, int len) ++{ ++ int i; ++ unsigned long state; ++ static unsigned long xprt_mask[] = { ++ XPRT_LOCKED, XPRT_CONNECTED, ++ XPRT_CONNECTING, XPRT_CLOSE_WAIT, ++ XPRT_BOUND, XPRT_BINDING, XPRT_CLOSING, ++ XPRT_CONGESTED}; ++ ++ static const char *const xprt_state_desc[] = { ++ "LOCKED", "CONNECTED", "CONNECTING", ++ "CLOSE_WAIT", "BOUND", "BINDING", ++ "CLOSING", "CONGESTED"}; ++ int pos = 0; ++ int ret = 0; ++ ++ if (xprt == NULL) { ++ enfs_log_error("The xprt is not valid.\n"); ++ return; ++ } ++ ++ if ((buf == NULL) || (len <= 0)) { ++ enfs_log_error( ++ "Xprt state buffer is not valid, len=%d.\n", ++ len); ++ return; ++ } ++ ++ xprt_get(xprt); ++ state = READ_ONCE(xprt->state); ++ xprt_put(xprt); ++ ++ for (i = 0; i < ARRAY_SIZE(xprt_mask); ++i) { ++ if (pos >= len) ++ break; ++ ++ if (!test_bit(xprt_mask[i], &state)) ++ continue; ++ ++ if (pos == 0) ++ ret = snprintf(buf, len, "%s", xprt_state_desc[i]); ++ else ++ ret = snprintf(buf + pos, len - pos, "|%s", ++ xprt_state_desc[i]); ++ ++ if (ret < 0) { ++ enfs_log_error("format state failed, ret %d.\n", ret); ++ break; ++ } ++ ++ pos += ret; ++ } ++} +diff --git a/fs/nfs/enfs/pm_state.h b/fs/nfs/enfs/pm_state.h +new file mode 100644 +index 000000000000..f5f52e5ab91d +--- /dev/null ++++ b/fs/nfs/enfs/pm_state.h +@@ -0,0 +1,28 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved. ++ * Description: path state header file ++ * Author: y00583252 ++ * Create: 2023-08-12 ++ */ ++ ++#ifndef PM_STATE_H ++#define PM_STATE_H ++ ++#include <linux/types.h> ++#include <linux/sunrpc/xprt.h> ++ ++enum pm_path_state { ++ PM_STATE_INIT, ++ PM_STATE_NORMAL, ++ PM_STATE_FAULT, ++ PM_STATE_UNDEFINED // xprt is not multipath xprt ++}; ++ ++void pm_set_path_state(struct rpc_xprt *xprt, enum pm_path_state state); ++enum pm_path_state pm_get_path_state(struct rpc_xprt *xprt); ++ ++void pm_get_path_state_desc(struct rpc_xprt *xprt, char *buf, int len); ++void pm_get_xprt_state_desc(struct rpc_xprt *xprt, char *buf, int len); ++ ++#endif // PM_STATE_H diff --git a/0006-add_enfs_compile_option.patch b/0006-add_enfs_compile_option.patch new file mode 100644 index 0000000..ff3bc0e --- /dev/null +++ b/0006-add_enfs_compile_option.patch @@ -0,0 +1,70 @@ +diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig +index b04256636d4b..ae53510c0627 100644 +--- a/arch/arm64/configs/openeuler_defconfig ++++ b/arch/arm64/configs/openeuler_defconfig +@@ -5344,6 +5344,7 @@ CONFIG_LOCKD=m + CONFIG_LOCKD_V4=y + CONFIG_NFS_ACL_SUPPORT=m + CONFIG_NFS_COMMON=y ++# CONFIG_ENFS is not set + CONFIG_SUNRPC=m + CONFIG_SUNRPC_GSS=m + CONFIG_SUNRPC_BACKCHANNEL=y +diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig +index 59baeb2973af..ccc317f7fdb2 100644 +--- a/arch/x86/configs/openeuler_defconfig ++++ b/arch/x86/configs/openeuler_defconfig +@@ -6825,6 +6825,7 @@ CONFIG_LOCKD=m + CONFIG_LOCKD_V4=y + CONFIG_NFS_ACL_SUPPORT=m + CONFIG_NFS_COMMON=y ++CONFIG_ENFS=y + CONFIG_SUNRPC=m + CONFIG_SUNRPC_GSS=m + CONFIG_SUNRPC_BACKCHANNEL=y +diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig +index e55f86713948..872c9b7671b1 100644 +--- a/fs/nfs/Kconfig ++++ b/fs/nfs/Kconfig +@@ -196,3 +196,14 @@ config NFS_DEBUG + depends on NFS_FS && SUNRPC_DEBUG + select CRC32 + default y ++ ++config ENFS ++ tristate "NFS client support for ENFS" ++ depends on NFS_FS ++ default n ++ help ++ This option enables support multipath of the NFS protocol ++ in the kernel's NFS client. ++ This feature will improve performance and reliability. ++ ++ If sure, say Y. +diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile +index c587e3c4c6a6..19d0ac2ba3b8 100644 +--- a/fs/nfs/Makefile ++++ b/fs/nfs/Makefile +@@ -12,6 +12,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \ + nfs-$(CONFIG_ROOT_NFS) += nfsroot.o + nfs-$(CONFIG_SYSCTL) += sysctl.o + nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o ++nfs-$(CONFIG_ENFS) += enfs_adapter.o + + obj-$(CONFIG_NFS_V2) += nfsv2.o + nfsv2-y := nfs2super.o proc.o nfs2xdr.o +@@ -34,3 +35,5 @@ nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o + obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/ + obj-$(CONFIG_PNFS_BLOCK) += blocklayout/ + obj-$(CONFIG_PNFS_FLEXFILE_LAYOUT) += flexfilelayout/ ++ ++obj-$(CONFIG_ENFS) += enfs/ +diff --git a/net/sunrpc/Makefile b/net/sunrpc/Makefile +index 090658c3da12..fe4e3b28c5d1 100644 +--- a/net/sunrpc/Makefile ++++ b/net/sunrpc/Makefile +@@ -19,3 +19,4 @@ sunrpc-$(CONFIG_SUNRPC_DEBUG) += debugfs.o + sunrpc-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel_rqst.o + sunrpc-$(CONFIG_PROC_FS) += stats.o + sunrpc-$(CONFIG_SYSCTL) += sysctl.o ++sunrpc-$(CONFIG_ENFS) += sunrpc_enfs_adapter.o -- 2.25.0.windows.1

1 0

[PATCH openEuler-1.0-LTS] netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
by Lu Wei 25 Sep '23

25 Sep '23

From: Kyle Zeng <zengyhkyle(a)gmail.com> mainline inclusion from mainline-v4.20-rc2 commit 886503f34d63e681662057448819edb5b1057a97 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I83QCZ CVE: CVE-2023-42753 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can lead to the use of wrong `CIDR_POS(c)` for calculating array offsets, which can lead to integer underflow. As a result, it leads to slab out-of-bound access. This patch adds back the IP_SET_HASH_WITH_NET0 macro to ip_set_hash_netportnet to address the issue. Fixes: 886503f34d63 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net") Suggested-by: Jozsef Kadlecsik <kadlec(a)netfilter.org> Signed-off-by: Kyle Zeng <zengyhkyle(a)gmail.com> Acked-by: Jozsef Kadlecsik <kadlec(a)netfilter.org> Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Lu Wei <luwei32(a)huawei.com> --- net/netfilter/ipset/ip_set_hash_netportnet.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c index 613e18e720a4..9290a4d7b862 100644 --- a/net/netfilter/ipset/ip_set_hash_netportnet.c +++ b/net/netfilter/ipset/ip_set_hash_netportnet.c @@ -39,6 +39,7 @@ MODULE_ALIAS("ip_set_hash:net,port,net"); #define IP_SET_HASH_WITH_PROTO #define IP_SET_HASH_WITH_NETS #define IPSET_NET_COUNT 2 +#define IP_SET_HASH_WITH_NET0 /* IPv4 variant */ -- 2.34.1

2 1

[PATCH openEuler-23.09] gpio: loongson: Add 3A/3B/3C/7A gpio dirver support
by Ming Wang 25 Sep '23

25 Sep '23

LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6BWFP -------------------------------- Signed-off-by: Juxin Gao <gaojuxin(a)loongson.cn> Signed-off-by: Ming Wang <wangming01(a)loongson.cn> --- drivers/gpio/Kconfig | 3 +- drivers/gpio/gpio-loongson.c | 414 ++++++++++++++++++++++++++++------- 2 files changed, 341 insertions(+), 76 deletions(-) diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index f45c6a36551c..be0cf9c87cd6 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -377,7 +377,8 @@ config GPIO_LOGICVC config GPIO_LOONGSON bool "Loongson-2/3 GPIO support" - depends on CPU_LOONGSON2EF || CPU_LOONGSON64 + depends on CPU_LOONGSON2EF || CPU_LOONGSON64 || LOONGARCH + default m help Driver for GPIO functionality on Loongson-2F/3A/3B processors. diff --git a/drivers/gpio/gpio-loongson.c b/drivers/gpio/gpio-loongson.c index a42145873cc9..a3a3d647a043 100644 --- a/drivers/gpio/gpio-loongson.c +++ b/drivers/gpio/gpio-loongson.c @@ -1,13 +1,13 @@ -// SPDX-License-Identifier: GPL-2.0-or-later /* - * Loongson-2F/3A/3B GPIO Support + * Loongson-3A/3B/3C/7A GPIO Support * - * Copyright (c) 2008 Richard Liu, STMicroelectronics <richard.liu(a)st.com> - * Copyright (c) 2008-2010 Arnaud Patard <apatard(a)mandriva.com> - * Copyright (c) 2013 Hongbing Hu <huhb(a)lemote.com> - * Copyright (c) 2014 Huacai Chen <chenhc(a)lemote.com> + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. */ +#include <linux/acpi.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/module.h> @@ -16,120 +16,384 @@ #include <linux/gpio/driver.h> #include <linux/platform_device.h> #include <linux/bitops.h> +#include <linux/property.h> #include <asm/types.h> -#include <loongson.h> -#define STLS2F_N_GPIO 4 -#define STLS3A_N_GPIO 16 +/* ============== Data structrues =============== */ -#ifdef CONFIG_CPU_LOONGSON64 -#define LOONGSON_N_GPIO STLS3A_N_GPIO -#else -#define LOONGSON_N_GPIO STLS2F_N_GPIO -#endif +/* gpio data */ +struct platform_gpio_data { + u32 gpio_conf; + u32 gpio_out; + u32 gpio_in; + u32 in_start_bit; + u32 support_irq; + char *label; + int gpio_base; + int ngpio; +}; + +#define GPIO_IO_CONF(x) (x->base + x->conf_offset) +#define GPIO_OUT(x) (x->base + x->out_offset) +#define GPIO_IN(x) (x->base + x->in_offset) + +#define LS7A_GPIO_OEN_BYTE(x, gpio) (x->base + x->conf_offset + gpio) +#define LS7A_GPIO_OUT_BYTE(x, gpio) (x->base + x->out_offset + gpio) +#define LS7A_GPIO_IN_BYTE(x, gpio) (x->base + x->in_offset + gpio) + +struct loongson_gpio_chip { + struct gpio_chip chip; + spinlock_t lock; + void __iomem *base; + int conf_offset; + int out_offset; + int in_offset; + int in_start_bit; + u16 *gsi_idx_map; + u16 mapsize; + bool support_irq; +}; /* - * Offset into the register where we read lines, we write them from offset 0. - * This offset is the only thing that stand between us and using - * GPIO_GENERIC. + * GPIO primitives. */ -#define LOONGSON_GPIO_IN_OFFSET 16 +static int loongson_gpio_request(struct gpio_chip *chip, unsigned int pin) +{ + if (pin >= chip->ngpio) + return -EINVAL; + else + return 0; +} + +static inline void +__set_direction(struct loongson_gpio_chip *lgpio, unsigned int pin, int input) +{ + u64 temp; + u8 value; -static DEFINE_SPINLOCK(gpio_lock); + if (!strcmp(lgpio->chip.label, "loongson,loongson3-gpio") || + !strncmp(lgpio->chip.label, "LOON0007", 8)) { + temp = readq(GPIO_IO_CONF(lgpio)); + if (input) + temp |= 1ULL << pin; + else + temp &= ~(1ULL << pin); + writeq(temp, GPIO_IO_CONF(lgpio)); + return; + } + if (!strcmp(lgpio->chip.label, "loongson,ls7a-gpio") || + !strncmp(lgpio->chip.label, "LOON0002", 8)) { + if (input) + value = 1; + else + value = 0; + writeb(value, LS7A_GPIO_OEN_BYTE(lgpio, pin)); + return; + } +} -static int loongson_gpio_get_value(struct gpio_chip *chip, unsigned gpio) +static void __set_level(struct loongson_gpio_chip *lgpio, unsigned int pin, int high) { - u32 val; + u64 temp; + u8 value; - spin_lock(&gpio_lock); - val = LOONGSON_GPIODATA; - spin_unlock(&gpio_lock); + /* If GPIO controller is on 3A,then... */ + if (!strcmp(lgpio->chip.label, "loongson,loongson3-gpio") || + !strncmp(lgpio->chip.label, "LOON0007", 8)) { + temp = readq(GPIO_OUT(lgpio)); + if (high) + temp |= 1ULL << pin; + else + temp &= ~(1ULL << pin); + writeq(temp, GPIO_OUT(lgpio)); + return; + } - return !!(val & BIT(gpio + LOONGSON_GPIO_IN_OFFSET)); + if (!strcmp(lgpio->chip.label, "loongson,ls7a-gpio") || + !strncmp(lgpio->chip.label, "LOON0002", 8)) { + if (high) + value = 1; + else + value = 0; + writeb(value, LS7A_GPIO_OUT_BYTE(lgpio, pin)); + return; + } } -static void loongson_gpio_set_value(struct gpio_chip *chip, - unsigned gpio, int value) +static int loongson_gpio_direction_input(struct gpio_chip *chip, unsigned int pin) { - u32 val; + unsigned long flags; + struct loongson_gpio_chip *lgpio = + container_of(chip, struct loongson_gpio_chip, chip); - spin_lock(&gpio_lock); - val = LOONGSON_GPIODATA; - if (value) - val |= BIT(gpio); - else - val &= ~BIT(gpio); - LOONGSON_GPIODATA = val; - spin_unlock(&gpio_lock); + spin_lock_irqsave(&lgpio->lock, flags); + __set_direction(lgpio, pin, 1); + spin_unlock_irqrestore(&lgpio->lock, flags); + + return 0; } -static int loongson_gpio_direction_input(struct gpio_chip *chip, unsigned gpio) +static int loongson_gpio_direction_output(struct gpio_chip *chip, + unsigned int pin, int value) { - u32 temp; + struct loongson_gpio_chip *lgpio = + container_of(chip, struct loongson_gpio_chip, chip); + unsigned long flags; - spin_lock(&gpio_lock); - temp = LOONGSON_GPIOIE; - temp |= BIT(gpio); - LOONGSON_GPIOIE = temp; - spin_unlock(&gpio_lock); + spin_lock_irqsave(&lgpio->lock, flags); + __set_level(lgpio, pin, value); + __set_direction(lgpio, pin, 0); + spin_unlock_irqrestore(&lgpio->lock, flags); return 0; } -static int loongson_gpio_direction_output(struct gpio_chip *chip, - unsigned gpio, int level) +static int loongson_gpio_get(struct gpio_chip *chip, unsigned int pin) { - u32 temp; + struct loongson_gpio_chip *lgpio = + container_of(chip, struct loongson_gpio_chip, chip); + u64 temp; + u8 value; + + /* GPIO controller in 3A is different for 7A */ + if (!strcmp(lgpio->chip.label, "loongson,loongson3-gpio") || + !strncmp(lgpio->chip.label, "LOON0007", 8)) { + temp = readq(GPIO_IN(lgpio)); + return ((temp & (1ULL << (pin + lgpio->in_start_bit))) != 0); + } + + if (!strcmp(lgpio->chip.label, "loongson,ls7a-gpio") || + !strncmp(lgpio->chip.label, "LOON0002", 8)) { + value = readb(LS7A_GPIO_IN_BYTE(lgpio, pin)); + return (value & 1); + } + + return -ENXIO; +} + +static void loongson_gpio_set(struct gpio_chip *chip, unsigned int pin, int value) +{ + struct loongson_gpio_chip *lgpio = + container_of(chip, struct loongson_gpio_chip, chip); + unsigned long flags; + + spin_lock_irqsave(&lgpio->lock, flags); + __set_level(lgpio, pin, value); + spin_unlock_irqrestore(&lgpio->lock, flags); +} + +static int loongson_gpio_to_irq(struct gpio_chip *chip, unsigned int offset) +{ + struct platform_device *pdev = + container_of(chip->parent, struct platform_device, dev); + struct loongson_gpio_chip *lgpio = + container_of(chip, struct loongson_gpio_chip, chip); + + if (offset >= chip->ngpio) + return -EINVAL; + + if ((lgpio->gsi_idx_map != NULL) && (offset < lgpio->mapsize)) + offset = lgpio->gsi_idx_map[offset]; + + return platform_get_irq(pdev, offset); +} + +static int loongson_gpio_init(struct device *dev, struct loongson_gpio_chip *lgpio, + struct device_node *np, + void __iomem *base) +{ + lgpio->chip.request = loongson_gpio_request; + lgpio->chip.direction_input = loongson_gpio_direction_input; + lgpio->chip.get = loongson_gpio_get; + lgpio->chip.direction_output = loongson_gpio_direction_output; + lgpio->chip.set = loongson_gpio_set; + lgpio->chip.can_sleep = 0; + lgpio->chip.fwnode = dev_fwnode(dev); + lgpio->chip.parent = dev; + spin_lock_init(&lgpio->lock); + lgpio->base = (void __iomem *)base; + + if (!strcmp(lgpio->chip.label, "loongson,ls7a-gpio") || + !strncmp(lgpio->chip.label, "LOON0002", 8) || + !strcmp(lgpio->chip.label, "loongson,loongson3-gpio") || + !strncmp(lgpio->chip.label, "LOON0007", 8)) { - loongson_gpio_set_value(chip, gpio, level); - spin_lock(&gpio_lock); - temp = LOONGSON_GPIOIE; - temp &= ~BIT(gpio); - LOONGSON_GPIOIE = temp; - spin_unlock(&gpio_lock); + lgpio->chip.to_irq = loongson_gpio_to_irq; + } + gpiochip_add(&lgpio->chip); return 0; } + +static void of_loongson_gpio_get_props(struct device_node *np, + struct loongson_gpio_chip *lgpio) +{ + const char *name; + + of_property_read_u32(np, "ngpios", (u32 *)&lgpio->chip.ngpio); + of_property_read_u32(np, "gpio_base", (u32 *)&lgpio->chip.base); + of_property_read_u32(np, "conf_offset", (u32 *)&lgpio->conf_offset); + of_property_read_u32(np, "out_offset", (u32 *)&lgpio->out_offset); + of_property_read_u32(np, "in_offset", (u32 *)&lgpio->in_offset); + of_property_read_string(np, "compatible", &name); + if (!strcmp(name, "loongson,loongson3-gpio")) { + of_property_read_u32(np, "in_start_bit", + (u32 *)&lgpio->in_start_bit); + if (of_property_read_bool(np, "support_irq")) + lgpio->support_irq = true; + } + lgpio->chip.label = kstrdup(name, GFP_KERNEL); +} + +static void acpi_loongson_gpio_get_props(struct platform_device *pdev, + struct loongson_gpio_chip *lgpio) +{ + + struct device *dev = &pdev->dev; + int rval; + + device_property_read_u32(dev, "ngpios", (u32 *)&lgpio->chip.ngpio); + device_property_read_u32(dev, "gpio_base", (u32 *)&lgpio->chip.base); + device_property_read_u32(dev, "conf_offset", (u32 *)&lgpio->conf_offset); + device_property_read_u32(dev, "out_offset", (u32 *)&lgpio->out_offset); + device_property_read_u32(dev, "in_offset", (u32 *)&lgpio->in_offset); + rval = device_property_read_u16_array(dev, "gsi_idx_map", NULL, 0); + if (rval > 0) { + lgpio->gsi_idx_map = + kmalloc_array(rval, sizeof(*lgpio->gsi_idx_map), + GFP_KERNEL); + if (unlikely(!lgpio->gsi_idx_map)) { + dev_err(dev, "Alloc gsi_idx_map fail!\n"); + } else { + lgpio->mapsize = rval; + device_property_read_u16_array(dev, "gsi_idx_map", + lgpio->gsi_idx_map, lgpio->mapsize); + } + } + if (!strcmp(pdev->name, "LOON0007")) { + device_property_read_u32(dev, "in_start_bit", + (u32 *)&lgpio->in_start_bit); + if (device_property_read_bool(dev, "support_irq")) + lgpio->support_irq = true; + } + lgpio->chip.label = kstrdup(pdev->name, GFP_KERNEL); +} + +static void platform_loongson_gpio_get_props(struct platform_device *pdev, + struct loongson_gpio_chip *lgpio) +{ + struct platform_gpio_data *gpio_data = + (struct platform_gpio_data *)pdev->dev.platform_data; + + lgpio->chip.ngpio = gpio_data->ngpio; + lgpio->chip.base = gpio_data->gpio_base; + lgpio->conf_offset = gpio_data->gpio_conf; + lgpio->out_offset = gpio_data->gpio_out; + lgpio->in_offset = gpio_data->gpio_in; + if (!strcmp(gpio_data->label, "loongson,loongson3-gpio")) { + lgpio->in_start_bit = gpio_data->in_start_bit; + lgpio->support_irq = gpio_data->support_irq; + } + lgpio->chip.label = kstrdup(gpio_data->label, GFP_KERNEL); +} + static int loongson_gpio_probe(struct platform_device *pdev) { - struct gpio_chip *gc; + struct resource *iores; + void __iomem *base; + struct loongson_gpio_chip *lgpio; + struct device_node *np = pdev->dev.of_node; struct device *dev = &pdev->dev; + int ret = 0; - gc = devm_kzalloc(dev, sizeof(*gc), GFP_KERNEL); - if (!gc) + lgpio = kzalloc(sizeof(struct loongson_gpio_chip), GFP_KERNEL); + if (!lgpio) return -ENOMEM; - gc->label = "loongson-gpio-chip"; - gc->base = 0; - gc->ngpio = LOONGSON_N_GPIO; - gc->get = loongson_gpio_get_value; - gc->set = loongson_gpio_set_value; - gc->direction_input = loongson_gpio_direction_input; - gc->direction_output = loongson_gpio_direction_output; + if (np) + of_loongson_gpio_get_props(np, lgpio); + else if (ACPI_COMPANION(&pdev->dev)) + acpi_loongson_gpio_get_props(pdev, lgpio); + else + platform_loongson_gpio_get_props(pdev, lgpio); + + iores = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!iores) { + ret = -ENODEV; + goto out; + } + if (!request_mem_region(iores->start, resource_size(iores), + pdev->name)) { + ret = -EBUSY; + goto out; + } + base = ioremap(iores->start, resource_size(iores)); + if (!base) { + ret = -ENOMEM; + goto out; + } + platform_set_drvdata(pdev, lgpio); + loongson_gpio_init(dev, lgpio, np, base); - return gpiochip_add_data(gc, NULL); + return 0; +out: + pr_err("%s: %s: missing mandatory property\n", __func__, np->name); + return ret; } -static struct platform_driver loongson_gpio_driver = { +static int loongson_gpio_remove(struct platform_device *pdev) +{ + struct loongson_gpio_chip *lgpio = platform_get_drvdata(pdev); + struct resource *mem; + + platform_set_drvdata(pdev, NULL); + gpiochip_remove(&lgpio->chip); + iounmap(lgpio->base); + kfree(lgpio->gsi_idx_map); + kfree(lgpio); + mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); + release_mem_region(mem->start, resource_size(mem)); + return 0; +} + +static const struct of_device_id loongson_gpio_dt_ids[] = { + { .compatible = "loongson,loongson3-gpio"}, + { .compatible = "loongson,ls7a-gpio"}, + {} +}; +MODULE_DEVICE_TABLE(of, loongson_gpio_dt_ids); + +static const struct acpi_device_id loongson_gpio_acpi_match[] = { + {"LOON0002"}, + {"LOON0007"}, + {} +}; +MODULE_DEVICE_TABLE(acpi, loongson_gpio_acpi_match); + +static struct platform_driver ls_gpio_driver = { .driver = { .name = "loongson-gpio", + .owner = THIS_MODULE, + .of_match_table = loongson_gpio_dt_ids, + .acpi_match_table = ACPI_PTR(loongson_gpio_acpi_match), }, .probe = loongson_gpio_probe, + .remove = loongson_gpio_remove, }; static int __init loongson_gpio_setup(void) { - struct platform_device *pdev; - int ret; - - ret = platform_driver_register(&loongson_gpio_driver); - if (ret) { - pr_err("error registering loongson GPIO driver\n"); - return ret; - } + return platform_driver_register(&ls_gpio_driver); +} +subsys_initcall(loongson_gpio_setup); - pdev = platform_device_register_simple("loongson-gpio", -1, NULL, 0); - return PTR_ERR_OR_ZERO(pdev); +static void __exit loongson_gpio_driver(void) +{ + platform_driver_unregister(&ls_gpio_driver); } -postcore_initcall(loongson_gpio_setup); +module_exit(loongson_gpio_driver); +MODULE_AUTHOR("Loongson Technology Corporation Limited"); +MODULE_DESCRIPTION("LOONGSON GPIO"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:loongson_gpio"); -- 2.39.2

2 1

[PATCH OLK-5.10] scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
by Yong Hu 25 Sep '23

25 Sep '23

From: Shuchang Li <lishuchang(a)hust.edu.cn> stable inclusion from stable-v5.10.180 commit bab8dc38b1a0a12bc064fc064269033bdcf5b88e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZCDZ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… -------------------------------- [ Upstream commit 91a0c0c1413239d0548b5aac4c82f38f6d53a91e ] When if_type equals zero and pci_resource_start(pdev, PCI_64BIT_BAR4) returns false, drbl_regs_memmap_p is not remapped. This passes a NULL pointer to iounmap(), which can trigger a WARN() on certain arches. When if_type equals six and pci_resource_start(pdev, PCI_64BIT_BAR4) returns true, drbl_regs_memmap_p may has been remapped and ctrl_regs_memmap_p is not remapped. This is a resource leak and passes a NULL pointer to iounmap(). To fix these issues, we need to add null checks before iounmap(), and change some goto labels. Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Shuchang Li <lishuchang(a)hust.edu.cn> Link: https://lore.kernel.org/r/20230404072133.1022-1-lishuchang@hust.edu.cn Reviewed-by: Justin Tee <justin.tee(a)broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yong Hu <yong.hu(a)windriver.com> --- drivers/scsi/lpfc/lpfc_init.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 17200b453cbb..1bb3c96a04bd 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -10477,7 +10477,7 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) goto out_iounmap_all; } else { error = -ENOMEM; - goto out_iounmap_all; + goto out_iounmap_ctrl; } } @@ -10495,7 +10495,7 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) dev_err(&pdev->dev, "ioremap failed for SLI4 HBA dpp registers.\n"); error = -ENOMEM; - goto out_iounmap_ctrl; + goto out_iounmap_all; } phba->pci_bar4_memmap_p = phba->sli4_hba.dpp_regs_memmap_p; } @@ -10520,9 +10520,11 @@ lpfc_sli4_pci_mem_setup(struct lpfc_hba *phba) return 0; out_iounmap_all: - iounmap(phba->sli4_hba.drbl_regs_memmap_p); + if (phba->sli4_hba.drbl_regs_memmap_p) + iounmap(phba->sli4_hba.drbl_regs_memmap_p); out_iounmap_ctrl: - iounmap(phba->sli4_hba.ctrl_regs_memmap_p); + if (phba->sli4_hba.ctrl_regs_memmap_p) + iounmap(phba->sli4_hba.ctrl_regs_memmap_p); out_iounmap_conf: iounmap(phba->sli4_hba.conf_regs_memmap_p); -- 2.34.1

2 1

[PATCH OLK-5.10] scsi: lpfc: Prevent lpfc_debugfs_lockstat_write() buffer overflow
by Yong Hu 25 Sep '23

25 Sep '23

From: Justin Tee <justin.tee(a)broadcom.com> stable inclusion from stable-v5.10.181 commit e0e7faee3a7dd6f51350cda64997116a247eb045 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZCDZ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… -------------------------------- [ Upstream commit c6087b82a9146826564a55c5ca0164cac40348f5 ] A static code analysis tool flagged the possibility of buffer overflow when using copy_from_user() for a debugfs entry. Currently, it is possible that copy_from_user() copies more bytes than what would fit in the mybuf char array. Add a min() restriction check between sizeof(mybuf) - 1 and nbytes passed from the userspace buffer to protect against buffer overflow. Link: https://lore.kernel.org/r/20230301231626.9621-2-justintee8345@gmail.com Signed-off-by: Justin Tee <justin.tee(a)broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yong Hu <yong.hu(a)windriver.com> --- drivers/scsi/lpfc/lpfc_debugfs.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c index fbc76d69ea0b..2b77cbbcdccb 100644 --- a/drivers/scsi/lpfc/lpfc_debugfs.c +++ b/drivers/scsi/lpfc/lpfc_debugfs.c @@ -2159,10 +2159,13 @@ lpfc_debugfs_lockstat_write(struct file *file, const char __user *buf, char mybuf[64]; char *pbuf; int i; + size_t bsize; memset(mybuf, 0, sizeof(mybuf)); - if (copy_from_user(mybuf, buf, nbytes)) + bsize = min(nbytes, (sizeof(mybuf) - 1)); + + if (copy_from_user(mybuf, buf, bsize)) return -EFAULT; pbuf = &mybuf[0]; @@ -2183,7 +2186,7 @@ lpfc_debugfs_lockstat_write(struct file *file, const char __user *buf, qp->lock_conflict.wq_access = 0; } } - return nbytes; + return bsize; } #endif -- 2.34.1

2 1

[PATCH openEuler-23.09 0/5] LoongArch: add old BPI compatibility
by yangyinglu 25 Sep '23

25 Sep '23

yangyinglu (5): LoongArch: add kernel setvirtmap for runtime LoongArch: Old BPI compatibility LoongArch: Fix virtual machine startup error LoongArch: Fixed EIOINTC structure members LoongArch: use arch specific phys_to_dma arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/addrspace.h | 1 + arch/loongarch/include/asm/efi.h | 1 + arch/loongarch/include/asm/irq.h | 1 + arch/loongarch/include/asm/loongarch.h | 1 + arch/loongarch/kernel/Makefile | 1 + arch/loongarch/kernel/acpi.c | 7 +- arch/loongarch/kernel/dma.c | 26 +- arch/loongarch/kernel/efi.c | 175 ++++++++- arch/loongarch/kernel/env.c | 6 + arch/loongarch/kernel/irq.c | 25 +- arch/loongarch/kernel/legacy_boot.c | 484 +++++++++++++++++++++++++ arch/loongarch/kernel/legacy_boot.h | 90 +++++ arch/loongarch/kernel/mem.c | 26 +- arch/loongarch/kernel/numa.c | 39 +- arch/loongarch/kernel/reset.c | 3 +- arch/loongarch/kernel/setup.c | 18 +- arch/loongarch/kernel/smp.c | 6 +- arch/loongarch/pci/acpi.c | 147 +++++++- drivers/firmware/efi/Makefile | 1 + drivers/irqchip/irq-loongarch-cpu.c | 7 +- drivers/irqchip/irq-loongson-eiointc.c | 46 ++- drivers/irqchip/irq-loongson-pch-pic.c | 5 + 23 files changed, 1075 insertions(+), 42 deletions(-) create mode 100644 arch/loongarch/kernel/legacy_boot.c create mode 100644 arch/loongarch/kernel/legacy_boot.h -- 2.20.1

2 6

[PATCH openEuler-23.09 0/5] LoongArch: add rtc driver and fix
by Ming Wang 25 Sep '23

25 Sep '23

Ming Wang (5): rtc: Add rtc driver for the Loongson family chips LoongArch: kdump: Add memory reservation for old kernel LoongArch: kexec: Add compatibility with old interfaces LoongArch: Fix kdump failure on v40 interface specification LoongArch: kdump: Add high memory reservation arch/loongarch/kernel/machine_kexec.c | 45 ++- arch/loongarch/kernel/setup.c | 94 +++++- drivers/rtc/Kconfig | 13 + drivers/rtc/Makefile | 1 + drivers/rtc/rtc-loongson.c | 397 ++++++++++++++++++++++++++ 5 files changed, 535 insertions(+), 15 deletions(-) create mode 100644 drivers/rtc/rtc-loongson.c -- 2.39.2

2 6

启动 openEuler 2023 年度优秀项目推荐
by Huxinwei 25 Sep '23

25 Sep '23

各位社区的开发者：经 openEuler技术委员会9月20日会议讨论，现正式启动 openEuler 2023年度优秀项目的评选，请各位社区开发者和参与者推荐。当前在评选标准和项目设置上的考虑，可以参见：oEEP (openeuler.org)<https://www.openeuler.org/zh/oEEP/?name=oEEP-0007%20openEuler%E4%BC%98%E7%A…> 。截止 2023 年 10 月 15 日（周日）为止，任意三名以上社区参与者联名，可以向 tc(a)openeuler.org<mailto:tc@openeuler.org> 推荐您认可的项目。推荐项目的邮件请在邮件主题中明确包含 “openEuler 2023 年度优秀项目推荐” 字样。推荐项目的邮件内容中，请明确联名推荐人的邮箱地址和相应的 gitee id，推荐的项目名称，项目代码仓位置，推荐获奖的方向。我将汇总所有推荐，在 10 月 18 日之前通过社区邮件列表公示。欢迎大家的参与和推荐 Regards openEuler Technical Committee

1 0

[PATCH openEuler-23.09 0/4] LoongArch: Add cpufreq and BMC
by Weihao Li 25 Sep '23

25 Sep '23

Weihao Li (4): cpufreq: Add cpufreq driver for LoongArch fbdev: add ls2k500sfb driver for ls2k500 bmc. ipmi: add ls2k500 bmc ipmi support. LoongArch: defconfig: enable CONFIG_FB_LS2K500=m. arch/loongarch/Kconfig | 6 + arch/loongarch/configs/loongson3_defconfig | 5 + arch/loongarch/include/asm/fpu.h | 13 +- drivers/char/ipmi/Makefile | 4 + drivers/char/ipmi/btlock.h | 92 ++ drivers/char/ipmi/ipmi_si.h | 11 + drivers/char/ipmi/ipmi_si_intf.c | 4 + drivers/char/ipmi/ipmi_si_ls2k500.c | 172 +++ drivers/char/ipmi/kcs_bmc_ls2k500.h | 67 + drivers/cpufreq/Kconfig | 11 + drivers/cpufreq/Makefile | 1 + drivers/cpufreq/loongson3-acpi-cpufreq.c | 1549 ++++++++++++++++++++ drivers/video/fbdev/Kconfig | 9 + drivers/video/fbdev/Makefile | 1 + drivers/video/fbdev/ls2k500sfb.c | 791 ++++++++++ 15 files changed, 2735 insertions(+), 1 deletion(-) create mode 100644 drivers/char/ipmi/btlock.h create mode 100644 drivers/char/ipmi/ipmi_si_ls2k500.c create mode 100644 drivers/char/ipmi/kcs_bmc_ls2k500.h create mode 100644 drivers/cpufreq/loongson3-acpi-cpufreq.c create mode 100644 drivers/video/fbdev/ls2k500sfb.c -- 2.20.1

2 5

[PATCH openEuler-23.09 0/5] LoongArch: add rtc driver and fix
by Ming Wang 25 Sep '23

25 Sep '23

Ming Wang (5): rtc: Add rtc driver for the Loongson family chips LoongArch: kdump: Add memory reservation for old kernel LoongArch: kexec: Add compatibility with old interfaces LoongArch: Fix kdump failure on v40 interface specification LoongArch: kdump: Add high memory reservation arch/loongarch/kernel/machine_kexec.c | 45 ++- arch/loongarch/kernel/setup.c | 94 +++++- drivers/rtc/Kconfig | 13 + drivers/rtc/Makefile | 1 + drivers/rtc/rtc-loongson.c | 397 ++++++++++++++++++++++++++ 5 files changed, 535 insertions(+), 15 deletions(-) create mode 100644 drivers/rtc/rtc-loongson.c -- 2.39.2

1 5

[PATCH OLK-5.10] sdei_watchdog: Avoid exception during sdei handler
by Zheng Zengkai 23 Sep '23

23 Sep '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I82QPR -------------------------------- On Kunpeng920 platform, when firmware triggers SDEI event too soon, A WARN_ON() will be called in sdei_watchdog_callback(), this leads to warning "sdei: unsafe: exception during handler" being reported in _sdei_handler(). As the comments for the warning mentioned, We took a synchronous exception from the SDEI handler. This could deadlock, and if you interrupt KVM it will hyp-panic instead. Remove the WARN_ON() to avoid potential issue and warning. Fixes: 0fa83fd0f8f7 ("sdei_watchdog: avoid possible false hardlockup") Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- arch/arm64/kernel/watchdog_sdei.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index aa980b090598..7fd8c2d3dd1b 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -78,7 +78,6 @@ static int sdei_watchdog_callback(u32 event, if (delta < watchdog_thresh * (u64)NSEC_PER_SEC * 4 / 5) { pr_err(FW_BUG "SDEI Watchdog event triggered too soon, " "time to last check:%lld ns\n", delta); - WARN_ON(1); return 0; } -- 2.20.1

2 1

[PATCH openEuler-1.0-LTS] sdei_watchdog: Avoid exception during sdei handler
by Zheng Zengkai 23 Sep '23

23 Sep '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I82QPR -------------------------------- On Kunpeng920 platform, when firmware triggers SDEI event too soon, A WARN_ON() will be called in sdei_watchdog_callback(), this leads to warning "sdei: unsafe: exception during handler" being reported in _sdei_handler(). As the comments for the warning mentioned, We took a synchronous exception from the SDEI handler. This could deadlock, and if you interrupt KVM it will hyp-panic instead. Remove the WARN_ON() to avoid potential issue and warning. Fixes: 37433b57ffdd ("sdei_watchdog: avoid possible false hardlockup") Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- arch/arm64/kernel/watchdog_sdei.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/kernel/watchdog_sdei.c b/arch/arm64/kernel/watchdog_sdei.c index a499a14b23c1..5884abdaeb9d 100644 --- a/arch/arm64/kernel/watchdog_sdei.c +++ b/arch/arm64/kernel/watchdog_sdei.c @@ -78,7 +78,6 @@ static int sdei_watchdog_callback(u32 event, if (delta < watchdog_thresh * (u64)NSEC_PER_SEC * 4 / 5) { pr_err(FW_BUG "SDEI Watchdog event triggered too soon, " "time to last check:%lld ns\n", delta); - WARN_ON(1); return 0; } -- 2.20.1

2 1

[PATCH 01/88] scsi: mpt3sas: Define hba_port structure
by Hao Zhang 22 Sep '23

22 Sep '23

From: Sreekanth Reddy <sreekanth.reddy(a)broadcom.com> Commit b22a0fac8c056e88fc72f7241fa9077b804634a6 upstream. Define a new hba_port structure which holds the following variables: - port_id: Port ID of the narrow/wide port of the HBA - sas_address: SAS Address of the remote device that is attached to the current HBA port - phy_mask: HBA's phy bits to which above SAS addressed device is attached - flags: This field is used to refresh port details during HBA reset Link: https://lore.kernel.org/r/20201027130847.9962-2-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy(a)broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Integrated-by: Siyu Zhang <siyu.zhang(a)windriver.com> --- drivers/scsi/mpt3sas/mpt3sas_base.h | 35 ++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index 86774747fe25..9a3429b1e7ce 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -420,6 +420,7 @@ struct Mpi2ManufacturingPage11_t { * @flags: MPT_TARGET_FLAGS_XXX flags * @deleted: target flaged for deletion * @tm_busy: target is busy with TM request. + * @port: hba port entry containing target's port number info * @sas_dev: The sas_device associated with this target * @pcie_dev: The pcie device associated with this target */ @@ -432,6 +433,7 @@ struct MPT3SAS_TARGET { u32 flags; u8 deleted; u8 tm_busy; + struct hba_port *port; struct _sas_device *sas_dev; struct _pcie_device *pcie_dev; }; @@ -534,6 +536,7 @@ struct _internal_cmd { * addition routine. * @chassis_slot: chassis slot * @is_chassis_slot_valid: chassis slot valid or not + * @port: hba port entry containing device's port number info */ struct _sas_device { struct list_head list; @@ -560,6 +563,7 @@ struct _sas_device { u8 is_chassis_slot_valid; u8 connector_name[5]; struct kref refcount; + struct hba_port *port; }; static inline void sas_device_get(struct _sas_device *s) @@ -730,6 +734,7 @@ struct _boot_device { * @remote_identify: attached device identification * @rphy: sas transport rphy object * @port: sas transport wide/narrow port object + * @hba_port: hba port entry containing port's port number info * @phy_list: _sas_phy list objects belonging to this port */ struct _sas_port { @@ -738,6 +743,7 @@ struct _sas_port { struct sas_identify remote_identify; struct sas_rphy *rphy; struct sas_port *port; + struct hba_port *hba_port; struct list_head phy_list; }; @@ -751,6 +757,7 @@ struct _sas_port { * @handle: device handle for this phy * @attached_handle: device handle for attached device * @phy_belongs_to_port: port has been created for this phy + * @port: hba port entry containing port number info */ struct _sas_phy { struct list_head port_siblings; @@ -761,6 +768,7 @@ struct _sas_phy { u16 handle; u16 attached_handle; u8 phy_belongs_to_port; + struct hba_port *port; }; /** @@ -776,6 +784,7 @@ struct _sas_phy { * @responding: used in _scsih_expander_device_mark_responding * @phy: a list of phys that make up this sas_host/expander * @sas_port_list: list of ports attached to this sas_host/expander + * @port: hba port entry containing node's port number info */ struct _sas_node { struct list_head list; @@ -787,11 +796,11 @@ struct _sas_node { u16 enclosure_handle; u64 enclosure_logical_id; u8 responding; + struct hba_port *port; struct _sas_phy *phy; struct list_head sas_port_list; }; - /** * struct _enclosure_node - enclosure information * @list: list of enclosures @@ -1009,6 +1018,27 @@ struct reply_post_struct { dma_addr_t reply_post_free_dma; }; +/** + * struct hba_port - Saves each HBA's Wide/Narrow port info + * @sas_address: sas address of this wide/narrow port's attached device + * @phy_mask: HBA PHY's belonging to this port + * @port_id: port number + * @flags: hba port flags + */ +struct hba_port { + struct list_head list; + u64 sas_address; + u32 phy_mask; + u8 port_id; + u8 flags; +}; + +/* hba port flags */ +#define HBA_PORT_FLAG_DIRTY_PORT 0x01 +#define HBA_PORT_FLAG_NEW_PORT 0x02 + +#define MULTIPATH_DISABLED_PORT_ID 0xFF + typedef void (*MPT3SAS_FLUSH_RUNNING_CMDS)(struct MPT3SAS_ADAPTER *ioc); /** * struct MPT3SAS_ADAPTER - per adapter struct @@ -1191,6 +1221,7 @@ typedef void (*MPT3SAS_FLUSH_RUNNING_CMDS)(struct MPT3SAS_ADAPTER *ioc); * which ensures the syncrhonization between cli/sysfs_show path. * @atomic_desc_capable: Atomic Request Descriptor support. * @GET_MSIX_INDEX: Get the msix index of high iops queues. + * @port_table_list: list containing HBA's wide/narrow port's info */ struct MPT3SAS_ADAPTER { struct list_head list; @@ -1483,6 +1514,8 @@ struct MPT3SAS_ADAPTER { PUT_SMID_IO_FP_HIP put_smid_hi_priority; PUT_SMID_DEFAULT put_smid_default; GET_MSIX_INDEX get_msix_index_for_smlio; + + struct list_head port_table_list; }; struct mpt3sas_debugfs_buffer { -- 2.33.0

1 87

openEuler kernel SIG双周例会
by openEuler conference 22 Sep '23

22 Sep '23

您好！ Kernel SIG 邀请您参加 2023-09-22 14:15 召开的WeLink会议(自动录制) 会议主题：openEuler kernel SIG双周例会会议链接：https://bmeeting.huaweicloud.com:36443/#/j/984839769 会议纪要：https://etherpad.openeuler.org/p/Kernel-meetings 温馨提醒：建议接入会议后修改参会人的姓名，也可以使用您在gitee.com的ID 更多资讯尽在：https://openeuler.org/zh/ Hello! openEuler Kernel SIG invites you to attend the WeLink conference(auto recording) will be held at 2023-09-22 14:15, The subject of the conference is openEuler kernel SIG双周例会, You can join the meeting at https://bmeeting.huaweicloud.com:36443/#/j/984839769. Add topics at https://etherpad.openeuler.org/p/Kernel-meetings. Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com. More information: https://openeuler.org/en/

1 0

（备用）openEuler Kernel SIG双周例会
by openEuler conference 22 Sep '23

22 Sep '23

您好！ Kernel SIG 邀请您参加 2023-09-22 14:15 召开的Zoom会议(自动录制) 会议主题：（备用）openEuler Kernel SIG双周例会会议内容： 1. 进展update 2. 议题征集中会议链接：https://us06web.zoom.us/j/83148210957?pwd=UO852Qaa2nEd3cIHULjWDmtCAmZ90O.1 会议纪要：https://etherpad.openeuler.org/p/Kernel-meetings 温馨提醒：建议接入会议后修改参会人的姓名，也可以使用您在gitee.com的ID 更多资讯尽在：https://openeuler.org/zh/ Hello! openEuler Kernel SIG invites you to attend the Zoom conference(auto recording) will be held at 2023-09-22 14:15, The subject of the conference is （备用）openEuler Kernel SIG双周例会, Summary: 1. 进展update 2. 议题征集中 You can join the meeting at https://us06web.zoom.us/j/83148210957?pwd=UO852Qaa2nEd3cIHULjWDmtCAmZ90O.1. Add topics at https://etherpad.openeuler.org/p/Kernel-meetings. Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com. More information: https://openeuler.org/en/

1 0

[PATCH openEuler-1.0-LTS] cpuidle: Fix kobject memory leaks in error paths
by Xia Fukun 22 Sep '23

22 Sep '23

From: Anel Orazgaliyeva <anelkz(a)amazon.de> stable inclusion from stable-v4.19.294 commit 22d44652b6d6404b96a40bb051d1046e6c005ae5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I81G0T CVE: NA -------------------------------- [ Upstream commit e5f5a66c9aa9c331da5527c2e3fd9394e7091e01 ] Commit c343bf1ba5ef ("cpuidle: Fix three reference count leaks") fixes the cleanup of kobjects; however, it removes kfree() calls altogether, leading to memory leaks. Fix those and also defer the initialization of dev->kobj_dev until after the error check, so that we do not end up with a dangling pointer. Fixes: c343bf1ba5ef ("cpuidle: Fix three reference count leaks") Signed-off-by: Anel Orazgaliyeva <anelkz(a)amazon.de> Suggested-by: Aman Priyadarshi <apeureka(a)amazon.de> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- drivers/cpuidle/sysfs.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c index 38986a36197e..76fcd45eadb5 100644 --- a/drivers/cpuidle/sysfs.c +++ b/drivers/cpuidle/sysfs.c @@ -481,6 +481,7 @@ static int cpuidle_add_state_sysfs(struct cpuidle_device *device) &kdev->kobj, "state%d", i); if (ret) { kobject_put(&kobj->kobj); + kfree(kobj); goto error_state; } cpuidle_add_s2idle_attr_group(kobj); @@ -612,6 +613,7 @@ static int cpuidle_add_driver_sysfs(struct cpuidle_device *dev) &kdev->kobj, "driver"); if (ret) { kobject_put(&kdrv->kobj); + kfree(kdrv); return ret; } @@ -698,7 +700,6 @@ int cpuidle_add_sysfs(struct cpuidle_device *dev) if (!kdev) return -ENOMEM; kdev->dev = dev; - dev->kobj_dev = kdev; init_completion(&kdev->kobj_unregister); @@ -706,9 +707,11 @@ int cpuidle_add_sysfs(struct cpuidle_device *dev) "cpuidle"); if (error) { kobject_put(&kdev->kobj); + kfree(kdev); return error; } + dev->kobj_dev = kdev; kobject_uevent(&kdev->kobj, KOBJ_ADD); return 0; -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] cec-api: prevent leaking memory through hole in structure
by Zhao Wenhui 21 Sep '23

21 Sep '23

From: Hans Verkuil <hverkuil-cisco(a)xs4all.nl> mainline inclusion from mainline-v5.9-rc1 commit 6c42227c3467549ddc65efe99c869021d2f4a570 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I82DIP CVE: CVE-2020-36766 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- Fix this smatch warning: drivers/media/cec/core/cec-api.c:156 cec_adap_g_log_addrs() warn: check that 'log_addrs' doesn't leak information (struct has a hole after 'features') Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org> Signed-off-by: Zhao Wenhui <zhaowenhui8(a)huawei.com> --- drivers/media/cec/cec-api.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/media/cec/cec-api.c b/drivers/media/cec/cec-api.c index 4961573850d5..b2b3f779592f 100644 --- a/drivers/media/cec/cec-api.c +++ b/drivers/media/cec/cec-api.c @@ -147,7 +147,13 @@ static long cec_adap_g_log_addrs(struct cec_adapter *adap, struct cec_log_addrs log_addrs; mutex_lock(&adap->lock); - log_addrs = adap->log_addrs; + /* + * We use memcpy here instead of assignment since there is a + * hole at the end of struct cec_log_addrs that an assignment + * might ignore. So when we do copy_to_user() we could leak + * one byte of memory. + */ + memcpy(&log_addrs, &adap->log_addrs, sizeof(log_addrs)); if (!adap->is_configured) memset(log_addrs.log_addr, CEC_LOG_ADDR_INVALID, sizeof(log_addrs.log_addr)); -- 2.34.1

2 1

[PATCH OLK-5.10] etmem: Fixed an issue where the module reference counting is incorrect
by liubo 21 Sep '23

21 Sep '23

euleros inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I839LV CVE: NA ---------------------------------------------------- When the /proc/pid/idle_page and /proc/pid/swap_page are opened, the try_module_get command is used to add reference counting to prevent the module from being released. However, if the file fails to be opened, the reference count must be correctly released in the abnormal process. Signed-off-by: liubo <liubo254(a)huawei.com> --- fs/proc/task_mmu.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 502893304027..9182d0c6d22c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1911,15 +1911,20 @@ static int mm_idle_open(struct inode *inode, struct file *file) } mm = proc_mem_open(inode, PTRACE_MODE_READ); - if (IS_ERR(mm)) + if (IS_ERR(mm)) { + module_put(module); return PTR_ERR(mm); + } file->private_data = mm; if (proc_page_scan_operations.open) - return proc_page_scan_operations.open(inode, file); + ret = proc_page_scan_operations.open(inode, file); - return 0; + if (ret != 0) + module_put(module); + + return ret; } static int mm_idle_release(struct inode *inode, struct file *file) @@ -2004,15 +2009,20 @@ static int mm_swap_open(struct inode *inode, struct file *file) } mm = proc_mem_open(inode, PTRACE_MODE_READ); - if (IS_ERR(mm)) + if (IS_ERR(mm)) { + module_put(module); return PTR_ERR(mm); + } file->private_data = mm; if (proc_swap_pages_operations.open) - return proc_swap_pages_operations.open(inode, file); + ret = proc_swap_pages_operations.open(inode, file); - return 0; + if (ret != 0) + module_put(module); + + return ret; } static int mm_swap_release(struct inode *inode, struct file *file) -- 2.33.0

2 1

[PATCH openEuler-1.0-LTS] crypto: hisilicon - reset before init the device
by wangyuan 21 Sep '23

21 Sep '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I830AI CVE: NA -------------------------------- Before initializing the device, reset the device to clear the residual data to prevent unexpected problems, such as reboot scene, which may maintain device state before reboot. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> --- drivers/crypto/hisilicon/hpre/hpre_main.c | 68 +++++++++++-------- drivers/crypto/hisilicon/qm.c | 83 ++++++++++++++++------- drivers/crypto/hisilicon/rde/rde_main.c | 64 ++++++++--------- drivers/crypto/hisilicon/sec2/sec_main.c | 39 ++++++----- drivers/crypto/hisilicon/zip/zip_main.c | 42 +++++++----- 5 files changed, 175 insertions(+), 121 deletions(-) diff --git a/drivers/crypto/hisilicon/hpre/hpre_main.c b/drivers/crypto/hisilicon/hpre/hpre_main.c index 1a980f255ad4..cbe8ea438fd2 100644 --- a/drivers/crypto/hisilicon/hpre/hpre_main.c +++ b/drivers/crypto/hisilicon/hpre/hpre_main.c @@ -780,28 +780,6 @@ static void hpre_debugfs_exit(struct hisi_qm *qm) debugfs_remove_recursive(qm->debug.debug_root); } -static int hpre_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) -{ - int ret; - - qm->algs = "rsa\ndh\n"; - qm->uacce_mode = uacce_mode; - qm->pdev = pdev; - ret = hisi_qm_pre_init(qm, pf_q_num, HPRE_PF_DEF_Q_BASE); - if (ret) - return ret; - if (qm->ver == QM_HW_V1) { - pci_warn(pdev, "HPRE version 1 is not supported!\n"); - return -EINVAL; - } - - qm->qm_list = &hpre_devices; - qm->sqe_size = HPRE_SQE_SIZE; - qm->dev_name = hpre_name; - - return 0; -} - static void hpre_log_hw_error(struct hisi_qm *qm, u32 err_sts) { const struct hpre_hw_error *err = hpre_hw_errors; @@ -836,30 +814,36 @@ static void hpre_open_axi_master_ooo(struct hisi_qm *qm) HPRE_ADDR(qm, HPRE_AM_OOO_SHUTDOWN_ENB)); } -static int hpre_pf_probe_init(struct hisi_qm *qm) +static void hpre_err_ini_set(struct hisi_qm *qm) { - int ret; - - if (qm->ver != QM_HW_V2) - return -EINVAL; + if (qm->fun_type == QM_HW_VF) + return; - qm->ctrl_q_num = HPRE_QUEUE_NUM_V2; qm->err_ini.get_dev_hw_err_status = hpre_get_hw_err_status; qm->err_ini.clear_dev_hw_err_status = hpre_clear_hw_err_status; qm->err_ini.err_info.ecc_2bits_mask = HPRE_CORE_ECC_2BIT_ERR | - HPRE_OOO_ECC_2BIT_ERR; + HPRE_OOO_ECC_2BIT_ERR; qm->err_ini.err_info.ce = QM_BASE_CE; qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; qm->err_ini.err_info.fe = 0; qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; qm->err_ini.err_info.acpi_rst = "HRST"; - qm->err_ini.hw_err_disable = hpre_hw_error_disable; qm->err_ini.hw_err_enable = hpre_hw_error_enable; qm->err_ini.set_usr_domain_cache = hpre_set_user_domain_and_cache; qm->err_ini.log_dev_hw_err = hpre_log_hw_error; qm->err_ini.open_axi_master_ooo = hpre_open_axi_master_ooo; qm->err_ini.err_info.msi_wr_port = HPRE_WR_MSI_PORT; +} + +static int hpre_pf_probe_init(struct hisi_qm *qm) +{ + int ret; + + if (qm->ver != QM_HW_V2) + return -EINVAL; + + qm->ctrl_q_num = HPRE_QUEUE_NUM_V2; ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) @@ -870,6 +854,30 @@ static int hpre_pf_probe_init(struct hisi_qm *qm) return 0; } +static int hpre_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) +{ + int ret; + + qm->algs = "rsa\ndh\n"; + qm->uacce_mode = uacce_mode; + qm->pdev = pdev; + ret = hisi_qm_pre_init(qm, pf_q_num, HPRE_PF_DEF_Q_BASE); + if (ret) + return ret; + + if (qm->ver == QM_HW_V1) { + pci_warn(pdev, "HPRE version 1 is not supported!\n"); + return -EINVAL; + } + + qm->qm_list = &hpre_devices; + qm->sqe_size = HPRE_SQE_SIZE; + qm->dev_name = hpre_name; + hpre_err_ini_set(qm); + + return 0; +} + static int hpre_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct hisi_qm *qm; diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index 739b1a6565fd..f2706dc0d55e 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -230,6 +230,7 @@ #define QMC_ALIGN(sz) ALIGN(sz, 32) static int __hisi_qm_start(struct hisi_qm *qm); +static int qm_reset_device(struct hisi_qm *qm); enum vft_type { SQC_VFT = 0, @@ -2584,6 +2585,30 @@ static int hisi_qm_memory_init(struct hisi_qm *qm) return ret; } +static int qm_clear_device(struct hisi_qm *qm) +{ + u32 val; + int ret; + + if (qm->fun_type == QM_HW_VF) + return 0; + + /* OOO register set and check */ + writel(MASTER_GLOBAL_CTRL_SHUTDOWN, qm->io_base + MASTER_GLOBAL_CTRL); + + ret = readl_relaxed_poll_timeout(qm->io_base + MASTER_TRANS_RETURN, + val, (val == MASTER_TRANS_RETURN_RW), + QM_REG_RD_INTVRL_US, + QM_REG_RD_TMOUT_US); + if (ret) { + pci_warn(qm->pdev, "Device is busy, can not clear device.\n"); + writel(0x0, qm->io_base + MASTER_GLOBAL_CTRL); + return ret; + } + + return qm_reset_device(qm); +} + static int hisi_qm_pci_init(struct hisi_qm *qm) { struct pci_dev *pdev = qm->pdev; @@ -2626,8 +2651,14 @@ static int hisi_qm_pci_init(struct hisi_qm *qm) goto err_set_mask_and_coherent; } + ret = qm_clear_device(qm); + if (ret) + goto err_free_vectors; + return 0; +err_free_vectors: + pci_free_irq_vectors(pdev); err_set_mask_and_coherent: devm_iounmap(dev, qm->io_base); err_ioremap: @@ -3808,6 +3839,34 @@ static void qm_dev_ecc_mbit_handle(struct hisi_qm *qm) } } +static int qm_reset_device(struct hisi_qm *qm) +{ + struct pci_dev *pdev = qm->pdev; + unsigned long long value = 0; + acpi_status s; + + /* The reset related sub-control registers are not in PCI BAR */ + if (ACPI_HANDLE(&pdev->dev)) { + s = acpi_evaluate_integer(ACPI_HANDLE(&pdev->dev), + qm->err_ini.err_info.acpi_rst, + NULL, &value); + if (ACPI_FAILURE(s)) { + pci_err(pdev, "NO controller reset method!\n"); + return -EIO; + } + + if (value) { + pci_err(pdev, "Reset step %llu failed!\n", value); + return -EIO; + } + + return 0; + } + + pci_err(pdev, "No reset method!\n"); + return -EINVAL; +} + static int qm_soft_reset(struct hisi_qm *qm) { struct pci_dev *pdev = qm->pdev; @@ -3853,29 +3912,7 @@ static int qm_soft_reset(struct hisi_qm *qm) return ret; } - /* The reset related sub-control registers are not in PCI BAR */ - if (ACPI_HANDLE(&pdev->dev)) { - unsigned long long value = 0; - acpi_status s; - - s = acpi_evaluate_integer(ACPI_HANDLE(&pdev->dev), - qm->err_ini.err_info.acpi_rst, - NULL, &value); - if (ACPI_FAILURE(s)) { - pci_err(pdev, "NO controller reset method!\n"); - return -EIO; - } - - if (value) { - pci_err(pdev, "Reset step %llu failed!\n", value); - return -EIO; - } - } else { - pci_err(pdev, "No reset method!\n"); - return -EINVAL; - } - - return 0; + return qm_reset_device(qm); } static int qm_vf_reset_done(struct pci_dev *pdev, diff --git a/drivers/crypto/hisilicon/rde/rde_main.c b/drivers/crypto/hisilicon/rde/rde_main.c index f3f70079aa77..f2e00ff891db 100644 --- a/drivers/crypto/hisilicon/rde/rde_main.c +++ b/drivers/crypto/hisilicon/rde/rde_main.c @@ -28,15 +28,8 @@ #define HRDE_QUEUE_NUM_V2 1024 #define HRDE_PCI_DEVICE_ID 0xa25a #define HRDE_SQE_SIZE 64 -#define HRDE_SQ_SIZE (HRDE_SQE_SIZE * QM_Q_DEPTH) #define HRDE_PF_DEF_Q_NUM 64 #define HRDE_PF_DEF_Q_BASE 0 -#define HRDE_RD_INTVRL_US 10 -#define HRDE_RD_TMOUT_US 1000 -#define HRDE_RST_TMOUT_MS 400 -#define HRDE_ENABLE 1 -#define HRDE_DISABLE 0 -#define HRDE_PCI_COMMAND_INVALID 0xFFFFFFFF #define HRDE_RAS_INT_MSK 0x310290 #define HRDE_RAS_CE_MSK BIT(2) @@ -101,7 +94,7 @@ static struct hisi_qm_list rde_devices; static void hisi_rde_ras_proc(struct work_struct *work); static const struct hisi_rde_hw_error rde_hw_error[] = { - {.int_msk = BIT(0), .msg = "Rde_ecc_1bitt_err"}, + {.int_msk = BIT(0), .msg = "Rde_ecc_1bit_err"}, {.int_msk = BIT(1), .msg = "Rde_ecc_2bit_err"}, {.int_msk = BIT(2), .msg = "Rde_stat_mgmt_state_timeout_err"}, {.int_msk = BIT(3), .msg = "Rde_data_wr_state_timeout_err"}, @@ -269,7 +262,7 @@ static int hisi_rde_set_user_domain_and_cache(struct hisi_qm *qm) writel(AXI_M_CFG, qm->io_base + QM_AXI_M_CFG); writel(AXI_M_CFG_ENABLE, qm->io_base + QM_AXI_M_CFG_ENABLE); - /* disable BME/PM/SRIOV FLR*/ + /* disable BME/PM/SRIOV FLR */ writel(PEH_AXUSER_CFG, qm->io_base + QM_PEH_AXUSER_CFG); writel(PEH_AXUSER_CFG_ENABLE, qm->io_base + QM_PEH_AXUSER_CFG_ENABLE); @@ -351,7 +344,7 @@ static int current_qm_write(struct ctrl_debug_file *file, u32 val) u32 tmp; if (val > 0) { - pr_err("Function id should be smaller than 0.\n"); + pr_err("Function id should be equal to 0.\n"); return -EINVAL; } @@ -423,7 +416,7 @@ static ssize_t ctrl_debug_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { struct ctrl_debug_file *file = filp->private_data; - char tbuf[20]; + char tbuf[HRDE_DBGFS_VAL_MAX_LEN]; unsigned long val; int len, ret; @@ -623,6 +616,24 @@ static void hisi_rde_open_master_ooo(struct hisi_qm *qm) writel(val | HRDE_AXI_SHUTDOWN_EN, qm->io_base + HRDE_CFG); } +static void hisi_rde_err_ini_set(struct hisi_qm *qm) +{ + qm->err_ini.get_dev_hw_err_status = hisi_rde_get_hw_err_status; + qm->err_ini.clear_dev_hw_err_status = hisi_rde_clear_hw_err_status; + qm->err_ini.err_info.ecc_2bits_mask = HRDE_ECC_2BIT_ERR; + qm->err_ini.err_info.ce = QM_BASE_CE; + qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; + qm->err_ini.err_info.fe = 0; + qm->err_ini.err_info.msi = 0; + qm->err_ini.err_info.acpi_rst = "RRST"; + qm->err_ini.hw_err_disable = hisi_rde_hw_error_disable; + qm->err_ini.hw_err_enable = hisi_rde_hw_error_enable; + qm->err_ini.set_usr_domain_cache = hisi_rde_set_user_domain_and_cache; + qm->err_ini.log_dev_hw_err = hisi_rde_hw_error_log; + qm->err_ini.open_axi_master_ooo = hisi_rde_open_master_ooo; + qm->err_ini.err_info.msi_wr_port = HRDE_WR_MSI_PORT; +} + static int hisi_rde_pf_probe_init(struct hisi_qm *qm) { struct hisi_rde *hisi_rde = container_of(qm, struct hisi_rde, qm); @@ -649,21 +660,6 @@ static int hisi_rde_pf_probe_init(struct hisi_qm *qm) return -EINVAL; } - qm->err_ini.get_dev_hw_err_status = hisi_rde_get_hw_err_status; - qm->err_ini.clear_dev_hw_err_status = hisi_rde_clear_hw_err_status; - qm->err_ini.err_info.ecc_2bits_mask = HRDE_ECC_2BIT_ERR; - qm->err_ini.err_info.ce = QM_BASE_CE; - qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; - qm->err_ini.err_info.fe = 0; - qm->err_ini.err_info.msi = 0; - qm->err_ini.err_info.acpi_rst = "RRST"; - qm->err_ini.hw_err_disable = hisi_rde_hw_error_disable; - qm->err_ini.hw_err_enable = hisi_rde_hw_error_enable; - qm->err_ini.set_usr_domain_cache = hisi_rde_set_user_domain_and_cache; - qm->err_ini.log_dev_hw_err = hisi_rde_hw_error_log; - qm->err_ini.open_axi_master_ooo = hisi_rde_open_master_ooo; - qm->err_ini.err_info.msi_wr_port = HRDE_WR_MSI_PORT; - ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) return ret; @@ -690,6 +686,7 @@ static int hisi_rde_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->sqe_size = HRDE_SQE_SIZE; qm->dev_name = hisi_rde_name; qm->abnormal_fix = hisi_rde_abnormal_fix; + hisi_rde_err_ini_set(qm); return 0; } @@ -727,31 +724,31 @@ static int hisi_rde_probe(struct pci_dev *pdev, const struct pci_device_id *id) ret = hisi_rde_qm_pre_init(qm, pdev); if (ret) { - pci_err(pdev, "Pre init qm failed!\n"); + pci_err(pdev, "Failed to pre init qm!\n"); return ret; } ret = hisi_qm_init(qm); if (ret) { - pci_err(pdev, "Init qm failed!\n"); + pci_err(pdev, "Failed to init qm!\n"); return ret; } ret = hisi_rde_pf_probe_init(qm); if (ret) { - pci_err(pdev, "Init pf failed!\n"); + pci_err(pdev, "Failed to init pf!\n"); goto err_qm_uninit; } ret = hisi_qm_start(qm); if (ret) { - pci_err(pdev, "Start qm failed!\n"); + pci_err(pdev, "Failed to start qm!\n"); goto err_qm_uninit; } ret = hisi_rde_debugfs_init(qm); if (ret) - pci_warn(pdev, "Init debugfs failed!\n"); + pci_warn(pdev, "Failed to init debugfs!\n"); hisi_qm_add_to_list(qm, &rde_devices); @@ -793,8 +790,7 @@ static void hisi_rde_ras_proc(struct work_struct *work) ret = hisi_qm_process_dev_error(pdev); if (ret == PCI_ERS_RESULT_NEED_RESET) if (hisi_qm_controller_reset(&hisi_rde->qm)) - dev_err(&pdev->dev, "Hisi_rde reset fail.\n"); - + dev_err(&pdev->dev, "Failed to reset device!\n"); } int hisi_rde_abnormal_fix(struct hisi_qm *qm) @@ -850,7 +846,7 @@ static int __init hisi_rde_init(void) ret = pci_register_driver(&hisi_rde_pci_driver); if (ret < 0) { hisi_rde_unregister_debugfs(); - pr_err("Register pci driver failed.\n"); + pr_err("Failed to register pci driver!\n"); } return ret; diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c index a568d5363c1e..0f32dcb69e12 100644 --- a/drivers/crypto/hisilicon/sec2/sec_main.c +++ b/drivers/crypto/hisilicon/sec2/sec_main.c @@ -712,29 +712,17 @@ static void sec_open_axi_master_ooo(struct hisi_qm *qm) writel(val | SEC_AXI_SHUTDOWN_ENABLE, SEC_ADDR(qm, SEC_CONTROL_REG)); } -static int sec_pf_probe_init(struct hisi_qm *qm) +static void sec_err_ini_set(struct hisi_qm *qm) { - int ret; - - switch (qm->ver) { - case QM_HW_V1: - qm->ctrl_q_num = SEC_QUEUE_NUM_V1; - break; - - case QM_HW_V2: - qm->ctrl_q_num = SEC_QUEUE_NUM_V2; - break; - - default: - return -EINVAL; - } + if (qm->fun_type == QM_HW_VF) + return; qm->err_ini.get_dev_hw_err_status = sec_get_hw_err_status; qm->err_ini.clear_dev_hw_err_status = sec_clear_hw_err_status; qm->err_ini.err_info.ecc_2bits_mask = SEC_CORE_INT_STATUS_M_ECC; qm->err_ini.err_info.ce = QM_BASE_CE; qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT | - QM_ACC_WB_NOT_READY_TIMEOUT; + QM_ACC_WB_NOT_READY_TIMEOUT; qm->err_ini.err_info.fe = 0; qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; qm->err_ini.err_info.acpi_rst = "SRST"; @@ -744,6 +732,24 @@ static int sec_pf_probe_init(struct hisi_qm *qm) qm->err_ini.log_dev_hw_err = sec_log_hw_error; qm->err_ini.open_axi_master_ooo = sec_open_axi_master_ooo; qm->err_ini.err_info.msi_wr_port = SEC_WR_MSI_PORT; +} + +static int sec_pf_probe_init(struct hisi_qm *qm) +{ + int ret; + + switch (qm->ver) { + case QM_HW_V1: + qm->ctrl_q_num = SEC_QUEUE_NUM_V1; + break; + + case QM_HW_V2: + qm->ctrl_q_num = SEC_QUEUE_NUM_V2; + break; + + default: + return -EINVAL; + } ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) @@ -807,6 +813,7 @@ static int sec_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->qm_list = &sec_devices; qm->sqe_size = SEC_SQE_SIZE; qm->dev_name = sec_name; + sec_err_ini_set(qm); return 0; } diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c index 17bbab667553..1ca51793e26a 100644 --- a/drivers/crypto/hisilicon/zip/zip_main.c +++ b/drivers/crypto/hisilicon/zip/zip_main.c @@ -204,7 +204,7 @@ static struct debugfs_reg32 hzip_dfx_regs[] = { {"HZIP_AVG_DELAY ", 0x28ull}, {"HZIP_MEM_VISIBLE_DATA ", 0x30ull}, {"HZIP_MEM_VISIBLE_ADDR ", 0x34ull}, - {"HZIP_COMSUMED_BYTE ", 0x38ull}, + {"HZIP_CONSUMED_BYTE ", 0x38ull}, {"HZIP_PRODUCED_BYTE ", 0x40ull}, {"HZIP_COMP_INF ", 0x70ull}, {"HZIP_PRE_OUT ", 0x78ull}, @@ -755,6 +755,28 @@ static void hisi_zip_close_axi_master_ooo(struct hisi_qm *qm) qm->io_base + HZIP_CORE_INT_SET); } +static void hisi_zip_err_ini_set(struct hisi_qm *qm) +{ + if (qm->fun_type == QM_HW_VF) + return; + + qm->err_ini.get_dev_hw_err_status = hisi_zip_get_hw_err_status; + qm->err_ini.clear_dev_hw_err_status = hisi_zip_clear_hw_err_status; + qm->err_ini.err_info.ecc_2bits_mask = HZIP_CORE_INT_STATUS_M_ECC; + qm->err_ini.err_info.ce = QM_BASE_CE; + qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_WB_NOT_READY_TIMEOUT; + qm->err_ini.err_info.fe = 0; + qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; + qm->err_ini.err_info.acpi_rst = "ZRST"; + qm->err_ini.hw_err_disable = hisi_zip_hw_error_disable; + qm->err_ini.hw_err_enable = hisi_zip_hw_error_enable; + qm->err_ini.set_usr_domain_cache = hisi_zip_set_user_domain_and_cache; + qm->err_ini.log_dev_hw_err = hisi_zip_log_hw_error; + qm->err_ini.open_axi_master_ooo = hisi_zip_open_axi_master_ooo; + qm->err_ini.close_axi_master_ooo = hisi_zip_close_axi_master_ooo; + qm->err_ini.err_info.msi_wr_port = HZIP_WR_PORT; +} + static int hisi_zip_pf_probe_init(struct hisi_qm *qm) { struct hisi_zip *zip = container_of(qm, struct hisi_zip, qm); @@ -781,23 +803,6 @@ static int hisi_zip_pf_probe_init(struct hisi_qm *qm) return -EINVAL; } - qm->err_ini.get_dev_hw_err_status = hisi_zip_get_hw_err_status; - qm->err_ini.clear_dev_hw_err_status = hisi_zip_clear_hw_err_status; - qm->err_ini.err_info.ecc_2bits_mask = HZIP_CORE_INT_STATUS_M_ECC; - qm->err_ini.err_info.ce = QM_BASE_CE; - qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_WB_NOT_READY_TIMEOUT; - qm->err_ini.err_info.fe = 0; - qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; - qm->err_ini.err_info.acpi_rst = "ZRST"; - qm->err_ini.hw_err_disable = hisi_zip_hw_error_disable; - qm->err_ini.hw_err_enable = hisi_zip_hw_error_enable; - qm->err_ini.set_usr_domain_cache = hisi_zip_set_user_domain_and_cache; - qm->err_ini.log_dev_hw_err = hisi_zip_log_hw_error; - qm->err_ini.open_axi_master_ooo = hisi_zip_open_axi_master_ooo; - qm->err_ini.close_axi_master_ooo = hisi_zip_close_axi_master_ooo; - - qm->err_ini.err_info.msi_wr_port = HZIP_WR_PORT; - ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) return ret; @@ -822,6 +827,7 @@ static int hisi_zip_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->sqe_size = HZIP_SQE_SIZE; qm->dev_name = hisi_zip_name; qm->qm_list = &zip_devices; + hisi_zip_err_ini_set(qm); return 0; } -- 2.30.0

2 1

[PATCH openEuler-23.09 0/5] LoongArch: add old BPI compatibility
by yangyinglu 21 Sep '23

21 Sep '23

LoongArch: add kernel setvirtmap for runtime LoongArch: Old BPI compatibility LoongArch: Fix virtual machine startup error LoongArch: Fixed EIOINTC structure members LoongArch: use arch specific phys_to_dma arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/addrspace.h | 1 + arch/loongarch/include/asm/efi.h | 1 + arch/loongarch/include/asm/irq.h | 1 + arch/loongarch/include/asm/loongarch.h | 1 + arch/loongarch/kernel/Makefile | 1 + arch/loongarch/kernel/acpi.c | 7 +- arch/loongarch/kernel/dma.c | 26 +- arch/loongarch/kernel/efi.c | 175 ++++++++- arch/loongarch/kernel/env.c | 6 + arch/loongarch/kernel/irq.c | 25 +- arch/loongarch/kernel/legacy_boot.c | 484 +++++++++++++++++++++++++ arch/loongarch/kernel/legacy_boot.h | 90 +++++ arch/loongarch/kernel/mem.c | 26 +- arch/loongarch/kernel/numa.c | 39 +- arch/loongarch/kernel/reset.c | 3 +- arch/loongarch/kernel/setup.c | 18 +- arch/loongarch/kernel/smp.c | 6 +- arch/loongarch/pci/acpi.c | 147 +++++++- drivers/firmware/efi/Makefile | 1 + drivers/irqchip/irq-loongarch-cpu.c | 7 +- drivers/irqchip/irq-loongson-eiointc.c | 46 ++- drivers/irqchip/irq-loongson-pch-pic.c | 5 + 23 files changed, 1075 insertions(+), 42 deletions(-) create mode 100644 arch/loongarch/kernel/legacy_boot.c create mode 100644 arch/loongarch/kernel/legacy_boot.h -- 2.20.1

2 6

[PATCH openEuler-1.0-LTS] crypto: hisilicon - reset before init the device
by w00416078 21 Sep '23

21 Sep '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I830AI CVE: NA -------------------------------- Before initializing the device, reset the device to clear the residual data to prevent unexpected problems, such as reboot scene, which may maintain device state before reboot. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> --- drivers/crypto/hisilicon/hpre/hpre_main.c | 68 +++++++++++-------- drivers/crypto/hisilicon/qm.c | 83 ++++++++++++++++------- drivers/crypto/hisilicon/rde/rde_main.c | 64 ++++++++--------- drivers/crypto/hisilicon/sec2/sec_main.c | 39 ++++++----- drivers/crypto/hisilicon/zip/zip_main.c | 42 +++++++----- 5 files changed, 175 insertions(+), 121 deletions(-) diff --git a/drivers/crypto/hisilicon/hpre/hpre_main.c b/drivers/crypto/hisilicon/hpre/hpre_main.c index 1a980f255ad4..cbe8ea438fd2 100644 --- a/drivers/crypto/hisilicon/hpre/hpre_main.c +++ b/drivers/crypto/hisilicon/hpre/hpre_main.c @@ -780,28 +780,6 @@ static void hpre_debugfs_exit(struct hisi_qm *qm) debugfs_remove_recursive(qm->debug.debug_root); } -static int hpre_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) -{ - int ret; - - qm->algs = "rsa\ndh\n"; - qm->uacce_mode = uacce_mode; - qm->pdev = pdev; - ret = hisi_qm_pre_init(qm, pf_q_num, HPRE_PF_DEF_Q_BASE); - if (ret) - return ret; - if (qm->ver == QM_HW_V1) { - pci_warn(pdev, "HPRE version 1 is not supported!\n"); - return -EINVAL; - } - - qm->qm_list = &hpre_devices; - qm->sqe_size = HPRE_SQE_SIZE; - qm->dev_name = hpre_name; - - return 0; -} - static void hpre_log_hw_error(struct hisi_qm *qm, u32 err_sts) { const struct hpre_hw_error *err = hpre_hw_errors; @@ -836,30 +814,36 @@ static void hpre_open_axi_master_ooo(struct hisi_qm *qm) HPRE_ADDR(qm, HPRE_AM_OOO_SHUTDOWN_ENB)); } -static int hpre_pf_probe_init(struct hisi_qm *qm) +static void hpre_err_ini_set(struct hisi_qm *qm) { - int ret; - - if (qm->ver != QM_HW_V2) - return -EINVAL; + if (qm->fun_type == QM_HW_VF) + return; - qm->ctrl_q_num = HPRE_QUEUE_NUM_V2; qm->err_ini.get_dev_hw_err_status = hpre_get_hw_err_status; qm->err_ini.clear_dev_hw_err_status = hpre_clear_hw_err_status; qm->err_ini.err_info.ecc_2bits_mask = HPRE_CORE_ECC_2BIT_ERR | - HPRE_OOO_ECC_2BIT_ERR; + HPRE_OOO_ECC_2BIT_ERR; qm->err_ini.err_info.ce = QM_BASE_CE; qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; qm->err_ini.err_info.fe = 0; qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; qm->err_ini.err_info.acpi_rst = "HRST"; - qm->err_ini.hw_err_disable = hpre_hw_error_disable; qm->err_ini.hw_err_enable = hpre_hw_error_enable; qm->err_ini.set_usr_domain_cache = hpre_set_user_domain_and_cache; qm->err_ini.log_dev_hw_err = hpre_log_hw_error; qm->err_ini.open_axi_master_ooo = hpre_open_axi_master_ooo; qm->err_ini.err_info.msi_wr_port = HPRE_WR_MSI_PORT; +} + +static int hpre_pf_probe_init(struct hisi_qm *qm) +{ + int ret; + + if (qm->ver != QM_HW_V2) + return -EINVAL; + + qm->ctrl_q_num = HPRE_QUEUE_NUM_V2; ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) @@ -870,6 +854,30 @@ static int hpre_pf_probe_init(struct hisi_qm *qm) return 0; } +static int hpre_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) +{ + int ret; + + qm->algs = "rsa\ndh\n"; + qm->uacce_mode = uacce_mode; + qm->pdev = pdev; + ret = hisi_qm_pre_init(qm, pf_q_num, HPRE_PF_DEF_Q_BASE); + if (ret) + return ret; + + if (qm->ver == QM_HW_V1) { + pci_warn(pdev, "HPRE version 1 is not supported!\n"); + return -EINVAL; + } + + qm->qm_list = &hpre_devices; + qm->sqe_size = HPRE_SQE_SIZE; + qm->dev_name = hpre_name; + hpre_err_ini_set(qm); + + return 0; +} + static int hpre_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct hisi_qm *qm; diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index 739b1a6565fd..f2706dc0d55e 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -230,6 +230,7 @@ #define QMC_ALIGN(sz) ALIGN(sz, 32) static int __hisi_qm_start(struct hisi_qm *qm); +static int qm_reset_device(struct hisi_qm *qm); enum vft_type { SQC_VFT = 0, @@ -2584,6 +2585,30 @@ static int hisi_qm_memory_init(struct hisi_qm *qm) return ret; } +static int qm_clear_device(struct hisi_qm *qm) +{ + u32 val; + int ret; + + if (qm->fun_type == QM_HW_VF) + return 0; + + /* OOO register set and check */ + writel(MASTER_GLOBAL_CTRL_SHUTDOWN, qm->io_base + MASTER_GLOBAL_CTRL); + + ret = readl_relaxed_poll_timeout(qm->io_base + MASTER_TRANS_RETURN, + val, (val == MASTER_TRANS_RETURN_RW), + QM_REG_RD_INTVRL_US, + QM_REG_RD_TMOUT_US); + if (ret) { + pci_warn(qm->pdev, "Device is busy, can not clear device.\n"); + writel(0x0, qm->io_base + MASTER_GLOBAL_CTRL); + return ret; + } + + return qm_reset_device(qm); +} + static int hisi_qm_pci_init(struct hisi_qm *qm) { struct pci_dev *pdev = qm->pdev; @@ -2626,8 +2651,14 @@ static int hisi_qm_pci_init(struct hisi_qm *qm) goto err_set_mask_and_coherent; } + ret = qm_clear_device(qm); + if (ret) + goto err_free_vectors; + return 0; +err_free_vectors: + pci_free_irq_vectors(pdev); err_set_mask_and_coherent: devm_iounmap(dev, qm->io_base); err_ioremap: @@ -3808,6 +3839,34 @@ static void qm_dev_ecc_mbit_handle(struct hisi_qm *qm) } } +static int qm_reset_device(struct hisi_qm *qm) +{ + struct pci_dev *pdev = qm->pdev; + unsigned long long value = 0; + acpi_status s; + + /* The reset related sub-control registers are not in PCI BAR */ + if (ACPI_HANDLE(&pdev->dev)) { + s = acpi_evaluate_integer(ACPI_HANDLE(&pdev->dev), + qm->err_ini.err_info.acpi_rst, + NULL, &value); + if (ACPI_FAILURE(s)) { + pci_err(pdev, "NO controller reset method!\n"); + return -EIO; + } + + if (value) { + pci_err(pdev, "Reset step %llu failed!\n", value); + return -EIO; + } + + return 0; + } + + pci_err(pdev, "No reset method!\n"); + return -EINVAL; +} + static int qm_soft_reset(struct hisi_qm *qm) { struct pci_dev *pdev = qm->pdev; @@ -3853,29 +3912,7 @@ static int qm_soft_reset(struct hisi_qm *qm) return ret; } - /* The reset related sub-control registers are not in PCI BAR */ - if (ACPI_HANDLE(&pdev->dev)) { - unsigned long long value = 0; - acpi_status s; - - s = acpi_evaluate_integer(ACPI_HANDLE(&pdev->dev), - qm->err_ini.err_info.acpi_rst, - NULL, &value); - if (ACPI_FAILURE(s)) { - pci_err(pdev, "NO controller reset method!\n"); - return -EIO; - } - - if (value) { - pci_err(pdev, "Reset step %llu failed!\n", value); - return -EIO; - } - } else { - pci_err(pdev, "No reset method!\n"); - return -EINVAL; - } - - return 0; + return qm_reset_device(qm); } static int qm_vf_reset_done(struct pci_dev *pdev, diff --git a/drivers/crypto/hisilicon/rde/rde_main.c b/drivers/crypto/hisilicon/rde/rde_main.c index f3f70079aa77..f2e00ff891db 100644 --- a/drivers/crypto/hisilicon/rde/rde_main.c +++ b/drivers/crypto/hisilicon/rde/rde_main.c @@ -28,15 +28,8 @@ #define HRDE_QUEUE_NUM_V2 1024 #define HRDE_PCI_DEVICE_ID 0xa25a #define HRDE_SQE_SIZE 64 -#define HRDE_SQ_SIZE (HRDE_SQE_SIZE * QM_Q_DEPTH) #define HRDE_PF_DEF_Q_NUM 64 #define HRDE_PF_DEF_Q_BASE 0 -#define HRDE_RD_INTVRL_US 10 -#define HRDE_RD_TMOUT_US 1000 -#define HRDE_RST_TMOUT_MS 400 -#define HRDE_ENABLE 1 -#define HRDE_DISABLE 0 -#define HRDE_PCI_COMMAND_INVALID 0xFFFFFFFF #define HRDE_RAS_INT_MSK 0x310290 #define HRDE_RAS_CE_MSK BIT(2) @@ -101,7 +94,7 @@ static struct hisi_qm_list rde_devices; static void hisi_rde_ras_proc(struct work_struct *work); static const struct hisi_rde_hw_error rde_hw_error[] = { - {.int_msk = BIT(0), .msg = "Rde_ecc_1bitt_err"}, + {.int_msk = BIT(0), .msg = "Rde_ecc_1bit_err"}, {.int_msk = BIT(1), .msg = "Rde_ecc_2bit_err"}, {.int_msk = BIT(2), .msg = "Rde_stat_mgmt_state_timeout_err"}, {.int_msk = BIT(3), .msg = "Rde_data_wr_state_timeout_err"}, @@ -269,7 +262,7 @@ static int hisi_rde_set_user_domain_and_cache(struct hisi_qm *qm) writel(AXI_M_CFG, qm->io_base + QM_AXI_M_CFG); writel(AXI_M_CFG_ENABLE, qm->io_base + QM_AXI_M_CFG_ENABLE); - /* disable BME/PM/SRIOV FLR*/ + /* disable BME/PM/SRIOV FLR */ writel(PEH_AXUSER_CFG, qm->io_base + QM_PEH_AXUSER_CFG); writel(PEH_AXUSER_CFG_ENABLE, qm->io_base + QM_PEH_AXUSER_CFG_ENABLE); @@ -351,7 +344,7 @@ static int current_qm_write(struct ctrl_debug_file *file, u32 val) u32 tmp; if (val > 0) { - pr_err("Function id should be smaller than 0.\n"); + pr_err("Function id should be equal to 0.\n"); return -EINVAL; } @@ -423,7 +416,7 @@ static ssize_t ctrl_debug_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { struct ctrl_debug_file *file = filp->private_data; - char tbuf[20]; + char tbuf[HRDE_DBGFS_VAL_MAX_LEN]; unsigned long val; int len, ret; @@ -623,6 +616,24 @@ static void hisi_rde_open_master_ooo(struct hisi_qm *qm) writel(val | HRDE_AXI_SHUTDOWN_EN, qm->io_base + HRDE_CFG); } +static void hisi_rde_err_ini_set(struct hisi_qm *qm) +{ + qm->err_ini.get_dev_hw_err_status = hisi_rde_get_hw_err_status; + qm->err_ini.clear_dev_hw_err_status = hisi_rde_clear_hw_err_status; + qm->err_ini.err_info.ecc_2bits_mask = HRDE_ECC_2BIT_ERR; + qm->err_ini.err_info.ce = QM_BASE_CE; + qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; + qm->err_ini.err_info.fe = 0; + qm->err_ini.err_info.msi = 0; + qm->err_ini.err_info.acpi_rst = "RRST"; + qm->err_ini.hw_err_disable = hisi_rde_hw_error_disable; + qm->err_ini.hw_err_enable = hisi_rde_hw_error_enable; + qm->err_ini.set_usr_domain_cache = hisi_rde_set_user_domain_and_cache; + qm->err_ini.log_dev_hw_err = hisi_rde_hw_error_log; + qm->err_ini.open_axi_master_ooo = hisi_rde_open_master_ooo; + qm->err_ini.err_info.msi_wr_port = HRDE_WR_MSI_PORT; +} + static int hisi_rde_pf_probe_init(struct hisi_qm *qm) { struct hisi_rde *hisi_rde = container_of(qm, struct hisi_rde, qm); @@ -649,21 +660,6 @@ static int hisi_rde_pf_probe_init(struct hisi_qm *qm) return -EINVAL; } - qm->err_ini.get_dev_hw_err_status = hisi_rde_get_hw_err_status; - qm->err_ini.clear_dev_hw_err_status = hisi_rde_clear_hw_err_status; - qm->err_ini.err_info.ecc_2bits_mask = HRDE_ECC_2BIT_ERR; - qm->err_ini.err_info.ce = QM_BASE_CE; - qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; - qm->err_ini.err_info.fe = 0; - qm->err_ini.err_info.msi = 0; - qm->err_ini.err_info.acpi_rst = "RRST"; - qm->err_ini.hw_err_disable = hisi_rde_hw_error_disable; - qm->err_ini.hw_err_enable = hisi_rde_hw_error_enable; - qm->err_ini.set_usr_domain_cache = hisi_rde_set_user_domain_and_cache; - qm->err_ini.log_dev_hw_err = hisi_rde_hw_error_log; - qm->err_ini.open_axi_master_ooo = hisi_rde_open_master_ooo; - qm->err_ini.err_info.msi_wr_port = HRDE_WR_MSI_PORT; - ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) return ret; @@ -690,6 +686,7 @@ static int hisi_rde_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->sqe_size = HRDE_SQE_SIZE; qm->dev_name = hisi_rde_name; qm->abnormal_fix = hisi_rde_abnormal_fix; + hisi_rde_err_ini_set(qm); return 0; } @@ -727,31 +724,31 @@ static int hisi_rde_probe(struct pci_dev *pdev, const struct pci_device_id *id) ret = hisi_rde_qm_pre_init(qm, pdev); if (ret) { - pci_err(pdev, "Pre init qm failed!\n"); + pci_err(pdev, "Failed to pre init qm!\n"); return ret; } ret = hisi_qm_init(qm); if (ret) { - pci_err(pdev, "Init qm failed!\n"); + pci_err(pdev, "Failed to init qm!\n"); return ret; } ret = hisi_rde_pf_probe_init(qm); if (ret) { - pci_err(pdev, "Init pf failed!\n"); + pci_err(pdev, "Failed to init pf!\n"); goto err_qm_uninit; } ret = hisi_qm_start(qm); if (ret) { - pci_err(pdev, "Start qm failed!\n"); + pci_err(pdev, "Failed to start qm!\n"); goto err_qm_uninit; } ret = hisi_rde_debugfs_init(qm); if (ret) - pci_warn(pdev, "Init debugfs failed!\n"); + pci_warn(pdev, "Failed to init debugfs!\n"); hisi_qm_add_to_list(qm, &rde_devices); @@ -793,8 +790,7 @@ static void hisi_rde_ras_proc(struct work_struct *work) ret = hisi_qm_process_dev_error(pdev); if (ret == PCI_ERS_RESULT_NEED_RESET) if (hisi_qm_controller_reset(&hisi_rde->qm)) - dev_err(&pdev->dev, "Hisi_rde reset fail.\n"); - + dev_err(&pdev->dev, "Failed to reset device!\n"); } int hisi_rde_abnormal_fix(struct hisi_qm *qm) @@ -850,7 +846,7 @@ static int __init hisi_rde_init(void) ret = pci_register_driver(&hisi_rde_pci_driver); if (ret < 0) { hisi_rde_unregister_debugfs(); - pr_err("Register pci driver failed.\n"); + pr_err("Failed to register pci driver!\n"); } return ret; diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c index a568d5363c1e..0f32dcb69e12 100644 --- a/drivers/crypto/hisilicon/sec2/sec_main.c +++ b/drivers/crypto/hisilicon/sec2/sec_main.c @@ -712,29 +712,17 @@ static void sec_open_axi_master_ooo(struct hisi_qm *qm) writel(val | SEC_AXI_SHUTDOWN_ENABLE, SEC_ADDR(qm, SEC_CONTROL_REG)); } -static int sec_pf_probe_init(struct hisi_qm *qm) +static void sec_err_ini_set(struct hisi_qm *qm) { - int ret; - - switch (qm->ver) { - case QM_HW_V1: - qm->ctrl_q_num = SEC_QUEUE_NUM_V1; - break; - - case QM_HW_V2: - qm->ctrl_q_num = SEC_QUEUE_NUM_V2; - break; - - default: - return -EINVAL; - } + if (qm->fun_type == QM_HW_VF) + return; qm->err_ini.get_dev_hw_err_status = sec_get_hw_err_status; qm->err_ini.clear_dev_hw_err_status = sec_clear_hw_err_status; qm->err_ini.err_info.ecc_2bits_mask = SEC_CORE_INT_STATUS_M_ECC; qm->err_ini.err_info.ce = QM_BASE_CE; qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT | - QM_ACC_WB_NOT_READY_TIMEOUT; + QM_ACC_WB_NOT_READY_TIMEOUT; qm->err_ini.err_info.fe = 0; qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; qm->err_ini.err_info.acpi_rst = "SRST"; @@ -744,6 +732,24 @@ static int sec_pf_probe_init(struct hisi_qm *qm) qm->err_ini.log_dev_hw_err = sec_log_hw_error; qm->err_ini.open_axi_master_ooo = sec_open_axi_master_ooo; qm->err_ini.err_info.msi_wr_port = SEC_WR_MSI_PORT; +} + +static int sec_pf_probe_init(struct hisi_qm *qm) +{ + int ret; + + switch (qm->ver) { + case QM_HW_V1: + qm->ctrl_q_num = SEC_QUEUE_NUM_V1; + break; + + case QM_HW_V2: + qm->ctrl_q_num = SEC_QUEUE_NUM_V2; + break; + + default: + return -EINVAL; + } ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) @@ -807,6 +813,7 @@ static int sec_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->qm_list = &sec_devices; qm->sqe_size = SEC_SQE_SIZE; qm->dev_name = sec_name; + sec_err_ini_set(qm); return 0; } diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c index 17bbab667553..1ca51793e26a 100644 --- a/drivers/crypto/hisilicon/zip/zip_main.c +++ b/drivers/crypto/hisilicon/zip/zip_main.c @@ -204,7 +204,7 @@ static struct debugfs_reg32 hzip_dfx_regs[] = { {"HZIP_AVG_DELAY ", 0x28ull}, {"HZIP_MEM_VISIBLE_DATA ", 0x30ull}, {"HZIP_MEM_VISIBLE_ADDR ", 0x34ull}, - {"HZIP_COMSUMED_BYTE ", 0x38ull}, + {"HZIP_CONSUMED_BYTE ", 0x38ull}, {"HZIP_PRODUCED_BYTE ", 0x40ull}, {"HZIP_COMP_INF ", 0x70ull}, {"HZIP_PRE_OUT ", 0x78ull}, @@ -755,6 +755,28 @@ static void hisi_zip_close_axi_master_ooo(struct hisi_qm *qm) qm->io_base + HZIP_CORE_INT_SET); } +static void hisi_zip_err_ini_set(struct hisi_qm *qm) +{ + if (qm->fun_type == QM_HW_VF) + return; + + qm->err_ini.get_dev_hw_err_status = hisi_zip_get_hw_err_status; + qm->err_ini.clear_dev_hw_err_status = hisi_zip_clear_hw_err_status; + qm->err_ini.err_info.ecc_2bits_mask = HZIP_CORE_INT_STATUS_M_ECC; + qm->err_ini.err_info.ce = QM_BASE_CE; + qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_WB_NOT_READY_TIMEOUT; + qm->err_ini.err_info.fe = 0; + qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; + qm->err_ini.err_info.acpi_rst = "ZRST"; + qm->err_ini.hw_err_disable = hisi_zip_hw_error_disable; + qm->err_ini.hw_err_enable = hisi_zip_hw_error_enable; + qm->err_ini.set_usr_domain_cache = hisi_zip_set_user_domain_and_cache; + qm->err_ini.log_dev_hw_err = hisi_zip_log_hw_error; + qm->err_ini.open_axi_master_ooo = hisi_zip_open_axi_master_ooo; + qm->err_ini.close_axi_master_ooo = hisi_zip_close_axi_master_ooo; + qm->err_ini.err_info.msi_wr_port = HZIP_WR_PORT; +} + static int hisi_zip_pf_probe_init(struct hisi_qm *qm) { struct hisi_zip *zip = container_of(qm, struct hisi_zip, qm); @@ -781,23 +803,6 @@ static int hisi_zip_pf_probe_init(struct hisi_qm *qm) return -EINVAL; } - qm->err_ini.get_dev_hw_err_status = hisi_zip_get_hw_err_status; - qm->err_ini.clear_dev_hw_err_status = hisi_zip_clear_hw_err_status; - qm->err_ini.err_info.ecc_2bits_mask = HZIP_CORE_INT_STATUS_M_ECC; - qm->err_ini.err_info.ce = QM_BASE_CE; - qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_WB_NOT_READY_TIMEOUT; - qm->err_ini.err_info.fe = 0; - qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; - qm->err_ini.err_info.acpi_rst = "ZRST"; - qm->err_ini.hw_err_disable = hisi_zip_hw_error_disable; - qm->err_ini.hw_err_enable = hisi_zip_hw_error_enable; - qm->err_ini.set_usr_domain_cache = hisi_zip_set_user_domain_and_cache; - qm->err_ini.log_dev_hw_err = hisi_zip_log_hw_error; - qm->err_ini.open_axi_master_ooo = hisi_zip_open_axi_master_ooo; - qm->err_ini.close_axi_master_ooo = hisi_zip_close_axi_master_ooo; - - qm->err_ini.err_info.msi_wr_port = HZIP_WR_PORT; - ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) return ret; @@ -822,6 +827,7 @@ static int hisi_zip_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->sqe_size = HZIP_SQE_SIZE; qm->dev_name = hisi_zip_name; qm->qm_list = &zip_devices; + hisi_zip_err_ini_set(qm); return 0; } -- 2.30.0

2 1

[PATCH OLK-5.10] ext4: do not mark inode dirty every time when appending using delalloc
by WoZ1zh1 20 Sep '23

20 Sep '23

From: Liu Song <liusong(a)linux.alibaba.com> mainline inclusion from mainline-v6.6-rc1 commit 03de20bed203b0819225d4de98353c1f8755a1dd category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I82QPS CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In the delalloc append write scenario, if inode's i_size is extended due to buffer write, there are delalloc writes pending in the range up to i_size, and no need to touch i_disksize since writeback will push i_disksize up to i_size eventually. Offers significant performance improvement in high-frequency append write scenarios. I conducted tests in my 32-core environment by launching 32 concurrent threads to append write to the same file. Each write operation had a length of 1024 bytes and was repeated 100000 times. Without using this patch, the test was completed in 7705 ms. However, with this patch, the test was completed in 5066 ms, resulting in a performance improvement of 34%. Moreover, in test scenarios of Kafka version 2.6.2, using packet size of 2K, with this patch resulted in a 10% performance improvement. Signed-off-by: Liu Song <liusong(a)linux.alibaba.com> Suggested-by: Jan Kara <jack(a)suse.cz> Reviewed-by: Jan Kara <jack(a)suse.cz> Link: https://lore.kernel.org/r/20230810154333.84921-1-liusong@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Signed-off-by: WoZ1zh1 <wozizhi(a)huawei.com> --- fs/ext4/inode.c | 88 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 62 insertions(+), 26 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a61e7ab21a16..1b6e16702298 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3080,14 +3080,73 @@ static int ext4_da_should_update_i_disksize(struct page *page, return 1; } +static int ext4_da_do_write_end(struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page) +{ + struct inode *inode = mapping->host; + loff_t old_size = inode->i_size; + bool disksize_changed = false; + loff_t new_i_size; + + /* + * block_write_end() will mark the inode as dirty with I_DIRTY_PAGES + * flag, which all that's needed to trigger page writeback. + */ + copied = block_write_end(NULL, mapping, pos, len, copied, page, NULL); + new_i_size = pos + copied; + + /* + * It's important to update i_size while still holding page lock, + * because page writeout could otherwise come in and zero beyond + * i_size. + * + * Since we are holding inode lock, we are sure i_disksize <= + * i_size. We also know that if i_disksize < i_size, there are + * delalloc writes pending in the range up to i_size. If the end of + * the current write is <= i_size, there's no need to touch + * i_disksize since writeback will push i_disksize up to i_size + * eventually. If the end of the current write is > i_size and + * inside an allocated block which ext4_da_should_update_i_disksize() + * checked, we need to update i_disksize here as certain + * ext4_writepages() paths not allocating blocks and update i_disksize. + */ + if (new_i_size > inode->i_size) { + unsigned long end; + + i_size_write(inode, new_i_size); + end = (new_i_size - 1) & (PAGE_SIZE - 1); + if (copied && ext4_da_should_update_i_disksize(page, end)) { + ext4_update_i_disksize(inode, new_i_size); + disksize_changed = true; + } + } + + unlock_page(page); + put_page(page); + + if (old_size < pos) + pagecache_isize_extended(inode, old_size, pos); + + if (disksize_changed) { + handle_t *handle; + + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + if (IS_ERR(handle)) + return PTR_ERR(handle); + ext4_mark_inode_dirty(handle, inode); + ext4_journal_stop(handle); + } + + return copied; +} + static int ext4_da_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { struct inode *inode = mapping->host; - loff_t new_i_size; - unsigned long start, end; int write_mode = (int)(unsigned long)fsdata; if (write_mode == FALL_BACK_TO_NONDELALLOC) @@ -3104,30 +3163,7 @@ static int ext4_da_write_end(struct file *file, if (unlikely(copied < len) && !PageUptodate(page)) copied = 0; - start = pos & (PAGE_SIZE - 1); - end = start + copied - 1; - - /* - * Since we are holding inode lock, we are sure i_disksize <= - * i_size. We also know that if i_disksize < i_size, there are - * delalloc writes pending in the range upto i_size. If the end of - * the current write is <= i_size, there's no need to touch - * i_disksize since writeback will push i_disksize upto i_size - * eventually. If the end of the current write is > i_size and - * inside an allocated block (ext4_da_should_update_i_disksize() - * check), we need to update i_disksize here as neither - * ext4_writepage() nor certain ext4_writepages() paths not - * allocating blocks update i_disksize. - * - * Note that we defer inode dirtying to generic_write_end() / - * ext4_da_write_inline_data_end(). - */ - new_i_size = pos + copied; - if (copied && new_i_size > inode->i_size && - ext4_da_should_update_i_disksize(page, end)) - ext4_update_i_disksize(inode, new_i_size); - - return generic_write_end(file, mapping, pos, len, copied, page, fsdata); + return ext4_da_do_write_end(mapping, pos, len, copied, page); } /* -- 2.39.2

2 1

[PATCH openEuler-1.0-LTS] crypto: hisilicon - reset before init the device
by w00416078 20 Sep '23

20 Sep '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: bugfix bugzilla:https://gitee.com/openeuler/kernel/issues/I830AI CVE: NA -------------------------------- Before initializing the device, reset the device to clear the residual data to prevent unexpected problems, such as reboot scene, which may maintain device state before reboot. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> --- drivers/crypto/hisilicon/hpre/hpre_main.c | 68 +++++++++++-------- drivers/crypto/hisilicon/qm.c | 83 ++++++++++++++++------- drivers/crypto/hisilicon/rde/rde_main.c | 64 ++++++++--------- drivers/crypto/hisilicon/sec2/sec_main.c | 39 ++++++----- drivers/crypto/hisilicon/zip/zip_main.c | 42 +++++++----- 5 files changed, 175 insertions(+), 121 deletions(-) diff --git a/drivers/crypto/hisilicon/hpre/hpre_main.c b/drivers/crypto/hisilicon/hpre/hpre_main.c index 1a980f255ad4..cbe8ea438fd2 100644 --- a/drivers/crypto/hisilicon/hpre/hpre_main.c +++ b/drivers/crypto/hisilicon/hpre/hpre_main.c @@ -780,28 +780,6 @@ static void hpre_debugfs_exit(struct hisi_qm *qm) debugfs_remove_recursive(qm->debug.debug_root); } -static int hpre_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) -{ - int ret; - - qm->algs = "rsa\ndh\n"; - qm->uacce_mode = uacce_mode; - qm->pdev = pdev; - ret = hisi_qm_pre_init(qm, pf_q_num, HPRE_PF_DEF_Q_BASE); - if (ret) - return ret; - if (qm->ver == QM_HW_V1) { - pci_warn(pdev, "HPRE version 1 is not supported!\n"); - return -EINVAL; - } - - qm->qm_list = &hpre_devices; - qm->sqe_size = HPRE_SQE_SIZE; - qm->dev_name = hpre_name; - - return 0; -} - static void hpre_log_hw_error(struct hisi_qm *qm, u32 err_sts) { const struct hpre_hw_error *err = hpre_hw_errors; @@ -836,30 +814,36 @@ static void hpre_open_axi_master_ooo(struct hisi_qm *qm) HPRE_ADDR(qm, HPRE_AM_OOO_SHUTDOWN_ENB)); } -static int hpre_pf_probe_init(struct hisi_qm *qm) +static void hpre_err_ini_set(struct hisi_qm *qm) { - int ret; - - if (qm->ver != QM_HW_V2) - return -EINVAL; + if (qm->fun_type == QM_HW_VF) + return; - qm->ctrl_q_num = HPRE_QUEUE_NUM_V2; qm->err_ini.get_dev_hw_err_status = hpre_get_hw_err_status; qm->err_ini.clear_dev_hw_err_status = hpre_clear_hw_err_status; qm->err_ini.err_info.ecc_2bits_mask = HPRE_CORE_ECC_2BIT_ERR | - HPRE_OOO_ECC_2BIT_ERR; + HPRE_OOO_ECC_2BIT_ERR; qm->err_ini.err_info.ce = QM_BASE_CE; qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; qm->err_ini.err_info.fe = 0; qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; qm->err_ini.err_info.acpi_rst = "HRST"; - qm->err_ini.hw_err_disable = hpre_hw_error_disable; qm->err_ini.hw_err_enable = hpre_hw_error_enable; qm->err_ini.set_usr_domain_cache = hpre_set_user_domain_and_cache; qm->err_ini.log_dev_hw_err = hpre_log_hw_error; qm->err_ini.open_axi_master_ooo = hpre_open_axi_master_ooo; qm->err_ini.err_info.msi_wr_port = HPRE_WR_MSI_PORT; +} + +static int hpre_pf_probe_init(struct hisi_qm *qm) +{ + int ret; + + if (qm->ver != QM_HW_V2) + return -EINVAL; + + qm->ctrl_q_num = HPRE_QUEUE_NUM_V2; ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) @@ -870,6 +854,30 @@ static int hpre_pf_probe_init(struct hisi_qm *qm) return 0; } +static int hpre_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) +{ + int ret; + + qm->algs = "rsa\ndh\n"; + qm->uacce_mode = uacce_mode; + qm->pdev = pdev; + ret = hisi_qm_pre_init(qm, pf_q_num, HPRE_PF_DEF_Q_BASE); + if (ret) + return ret; + + if (qm->ver == QM_HW_V1) { + pci_warn(pdev, "HPRE version 1 is not supported!\n"); + return -EINVAL; + } + + qm->qm_list = &hpre_devices; + qm->sqe_size = HPRE_SQE_SIZE; + qm->dev_name = hpre_name; + hpre_err_ini_set(qm); + + return 0; +} + static int hpre_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct hisi_qm *qm; diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index 739b1a6565fd..f2706dc0d55e 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -230,6 +230,7 @@ #define QMC_ALIGN(sz) ALIGN(sz, 32) static int __hisi_qm_start(struct hisi_qm *qm); +static int qm_reset_device(struct hisi_qm *qm); enum vft_type { SQC_VFT = 0, @@ -2584,6 +2585,30 @@ static int hisi_qm_memory_init(struct hisi_qm *qm) return ret; } +static int qm_clear_device(struct hisi_qm *qm) +{ + u32 val; + int ret; + + if (qm->fun_type == QM_HW_VF) + return 0; + + /* OOO register set and check */ + writel(MASTER_GLOBAL_CTRL_SHUTDOWN, qm->io_base + MASTER_GLOBAL_CTRL); + + ret = readl_relaxed_poll_timeout(qm->io_base + MASTER_TRANS_RETURN, + val, (val == MASTER_TRANS_RETURN_RW), + QM_REG_RD_INTVRL_US, + QM_REG_RD_TMOUT_US); + if (ret) { + pci_warn(qm->pdev, "Device is busy, can not clear device.\n"); + writel(0x0, qm->io_base + MASTER_GLOBAL_CTRL); + return ret; + } + + return qm_reset_device(qm); +} + static int hisi_qm_pci_init(struct hisi_qm *qm) { struct pci_dev *pdev = qm->pdev; @@ -2626,8 +2651,14 @@ static int hisi_qm_pci_init(struct hisi_qm *qm) goto err_set_mask_and_coherent; } + ret = qm_clear_device(qm); + if (ret) + goto err_free_vectors; + return 0; +err_free_vectors: + pci_free_irq_vectors(pdev); err_set_mask_and_coherent: devm_iounmap(dev, qm->io_base); err_ioremap: @@ -3808,6 +3839,34 @@ static void qm_dev_ecc_mbit_handle(struct hisi_qm *qm) } } +static int qm_reset_device(struct hisi_qm *qm) +{ + struct pci_dev *pdev = qm->pdev; + unsigned long long value = 0; + acpi_status s; + + /* The reset related sub-control registers are not in PCI BAR */ + if (ACPI_HANDLE(&pdev->dev)) { + s = acpi_evaluate_integer(ACPI_HANDLE(&pdev->dev), + qm->err_ini.err_info.acpi_rst, + NULL, &value); + if (ACPI_FAILURE(s)) { + pci_err(pdev, "NO controller reset method!\n"); + return -EIO; + } + + if (value) { + pci_err(pdev, "Reset step %llu failed!\n", value); + return -EIO; + } + + return 0; + } + + pci_err(pdev, "No reset method!\n"); + return -EINVAL; +} + static int qm_soft_reset(struct hisi_qm *qm) { struct pci_dev *pdev = qm->pdev; @@ -3853,29 +3912,7 @@ static int qm_soft_reset(struct hisi_qm *qm) return ret; } - /* The reset related sub-control registers are not in PCI BAR */ - if (ACPI_HANDLE(&pdev->dev)) { - unsigned long long value = 0; - acpi_status s; - - s = acpi_evaluate_integer(ACPI_HANDLE(&pdev->dev), - qm->err_ini.err_info.acpi_rst, - NULL, &value); - if (ACPI_FAILURE(s)) { - pci_err(pdev, "NO controller reset method!\n"); - return -EIO; - } - - if (value) { - pci_err(pdev, "Reset step %llu failed!\n", value); - return -EIO; - } - } else { - pci_err(pdev, "No reset method!\n"); - return -EINVAL; - } - - return 0; + return qm_reset_device(qm); } static int qm_vf_reset_done(struct pci_dev *pdev, diff --git a/drivers/crypto/hisilicon/rde/rde_main.c b/drivers/crypto/hisilicon/rde/rde_main.c index f3f70079aa77..f2e00ff891db 100644 --- a/drivers/crypto/hisilicon/rde/rde_main.c +++ b/drivers/crypto/hisilicon/rde/rde_main.c @@ -28,15 +28,8 @@ #define HRDE_QUEUE_NUM_V2 1024 #define HRDE_PCI_DEVICE_ID 0xa25a #define HRDE_SQE_SIZE 64 -#define HRDE_SQ_SIZE (HRDE_SQE_SIZE * QM_Q_DEPTH) #define HRDE_PF_DEF_Q_NUM 64 #define HRDE_PF_DEF_Q_BASE 0 -#define HRDE_RD_INTVRL_US 10 -#define HRDE_RD_TMOUT_US 1000 -#define HRDE_RST_TMOUT_MS 400 -#define HRDE_ENABLE 1 -#define HRDE_DISABLE 0 -#define HRDE_PCI_COMMAND_INVALID 0xFFFFFFFF #define HRDE_RAS_INT_MSK 0x310290 #define HRDE_RAS_CE_MSK BIT(2) @@ -101,7 +94,7 @@ static struct hisi_qm_list rde_devices; static void hisi_rde_ras_proc(struct work_struct *work); static const struct hisi_rde_hw_error rde_hw_error[] = { - {.int_msk = BIT(0), .msg = "Rde_ecc_1bitt_err"}, + {.int_msk = BIT(0), .msg = "Rde_ecc_1bit_err"}, {.int_msk = BIT(1), .msg = "Rde_ecc_2bit_err"}, {.int_msk = BIT(2), .msg = "Rde_stat_mgmt_state_timeout_err"}, {.int_msk = BIT(3), .msg = "Rde_data_wr_state_timeout_err"}, @@ -269,7 +262,7 @@ static int hisi_rde_set_user_domain_and_cache(struct hisi_qm *qm) writel(AXI_M_CFG, qm->io_base + QM_AXI_M_CFG); writel(AXI_M_CFG_ENABLE, qm->io_base + QM_AXI_M_CFG_ENABLE); - /* disable BME/PM/SRIOV FLR*/ + /* disable BME/PM/SRIOV FLR */ writel(PEH_AXUSER_CFG, qm->io_base + QM_PEH_AXUSER_CFG); writel(PEH_AXUSER_CFG_ENABLE, qm->io_base + QM_PEH_AXUSER_CFG_ENABLE); @@ -351,7 +344,7 @@ static int current_qm_write(struct ctrl_debug_file *file, u32 val) u32 tmp; if (val > 0) { - pr_err("Function id should be smaller than 0.\n"); + pr_err("Function id should be equal to 0.\n"); return -EINVAL; } @@ -423,7 +416,7 @@ static ssize_t ctrl_debug_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { struct ctrl_debug_file *file = filp->private_data; - char tbuf[20]; + char tbuf[HRDE_DBGFS_VAL_MAX_LEN]; unsigned long val; int len, ret; @@ -623,6 +616,24 @@ static void hisi_rde_open_master_ooo(struct hisi_qm *qm) writel(val | HRDE_AXI_SHUTDOWN_EN, qm->io_base + HRDE_CFG); } +static void hisi_rde_err_ini_set(struct hisi_qm *qm) +{ + qm->err_ini.get_dev_hw_err_status = hisi_rde_get_hw_err_status; + qm->err_ini.clear_dev_hw_err_status = hisi_rde_clear_hw_err_status; + qm->err_ini.err_info.ecc_2bits_mask = HRDE_ECC_2BIT_ERR; + qm->err_ini.err_info.ce = QM_BASE_CE; + qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; + qm->err_ini.err_info.fe = 0; + qm->err_ini.err_info.msi = 0; + qm->err_ini.err_info.acpi_rst = "RRST"; + qm->err_ini.hw_err_disable = hisi_rde_hw_error_disable; + qm->err_ini.hw_err_enable = hisi_rde_hw_error_enable; + qm->err_ini.set_usr_domain_cache = hisi_rde_set_user_domain_and_cache; + qm->err_ini.log_dev_hw_err = hisi_rde_hw_error_log; + qm->err_ini.open_axi_master_ooo = hisi_rde_open_master_ooo; + qm->err_ini.err_info.msi_wr_port = HRDE_WR_MSI_PORT; +} + static int hisi_rde_pf_probe_init(struct hisi_qm *qm) { struct hisi_rde *hisi_rde = container_of(qm, struct hisi_rde, qm); @@ -649,21 +660,6 @@ static int hisi_rde_pf_probe_init(struct hisi_qm *qm) return -EINVAL; } - qm->err_ini.get_dev_hw_err_status = hisi_rde_get_hw_err_status; - qm->err_ini.clear_dev_hw_err_status = hisi_rde_clear_hw_err_status; - qm->err_ini.err_info.ecc_2bits_mask = HRDE_ECC_2BIT_ERR; - qm->err_ini.err_info.ce = QM_BASE_CE; - qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT; - qm->err_ini.err_info.fe = 0; - qm->err_ini.err_info.msi = 0; - qm->err_ini.err_info.acpi_rst = "RRST"; - qm->err_ini.hw_err_disable = hisi_rde_hw_error_disable; - qm->err_ini.hw_err_enable = hisi_rde_hw_error_enable; - qm->err_ini.set_usr_domain_cache = hisi_rde_set_user_domain_and_cache; - qm->err_ini.log_dev_hw_err = hisi_rde_hw_error_log; - qm->err_ini.open_axi_master_ooo = hisi_rde_open_master_ooo; - qm->err_ini.err_info.msi_wr_port = HRDE_WR_MSI_PORT; - ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) return ret; @@ -690,6 +686,7 @@ static int hisi_rde_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->sqe_size = HRDE_SQE_SIZE; qm->dev_name = hisi_rde_name; qm->abnormal_fix = hisi_rde_abnormal_fix; + hisi_rde_err_ini_set(qm); return 0; } @@ -727,31 +724,31 @@ static int hisi_rde_probe(struct pci_dev *pdev, const struct pci_device_id *id) ret = hisi_rde_qm_pre_init(qm, pdev); if (ret) { - pci_err(pdev, "Pre init qm failed!\n"); + pci_err(pdev, "Failed to pre init qm!\n"); return ret; } ret = hisi_qm_init(qm); if (ret) { - pci_err(pdev, "Init qm failed!\n"); + pci_err(pdev, "Failed to init qm!\n"); return ret; } ret = hisi_rde_pf_probe_init(qm); if (ret) { - pci_err(pdev, "Init pf failed!\n"); + pci_err(pdev, "Failed to init pf!\n"); goto err_qm_uninit; } ret = hisi_qm_start(qm); if (ret) { - pci_err(pdev, "Start qm failed!\n"); + pci_err(pdev, "Failed to start qm!\n"); goto err_qm_uninit; } ret = hisi_rde_debugfs_init(qm); if (ret) - pci_warn(pdev, "Init debugfs failed!\n"); + pci_warn(pdev, "Failed to init debugfs!\n"); hisi_qm_add_to_list(qm, &rde_devices); @@ -793,8 +790,7 @@ static void hisi_rde_ras_proc(struct work_struct *work) ret = hisi_qm_process_dev_error(pdev); if (ret == PCI_ERS_RESULT_NEED_RESET) if (hisi_qm_controller_reset(&hisi_rde->qm)) - dev_err(&pdev->dev, "Hisi_rde reset fail.\n"); - + dev_err(&pdev->dev, "Failed to reset device!\n"); } int hisi_rde_abnormal_fix(struct hisi_qm *qm) @@ -850,7 +846,7 @@ static int __init hisi_rde_init(void) ret = pci_register_driver(&hisi_rde_pci_driver); if (ret < 0) { hisi_rde_unregister_debugfs(); - pr_err("Register pci driver failed.\n"); + pr_err("Failed to register pci driver!\n"); } return ret; diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c index a568d5363c1e..0f32dcb69e12 100644 --- a/drivers/crypto/hisilicon/sec2/sec_main.c +++ b/drivers/crypto/hisilicon/sec2/sec_main.c @@ -712,29 +712,17 @@ static void sec_open_axi_master_ooo(struct hisi_qm *qm) writel(val | SEC_AXI_SHUTDOWN_ENABLE, SEC_ADDR(qm, SEC_CONTROL_REG)); } -static int sec_pf_probe_init(struct hisi_qm *qm) +static void sec_err_ini_set(struct hisi_qm *qm) { - int ret; - - switch (qm->ver) { - case QM_HW_V1: - qm->ctrl_q_num = SEC_QUEUE_NUM_V1; - break; - - case QM_HW_V2: - qm->ctrl_q_num = SEC_QUEUE_NUM_V2; - break; - - default: - return -EINVAL; - } + if (qm->fun_type == QM_HW_VF) + return; qm->err_ini.get_dev_hw_err_status = sec_get_hw_err_status; qm->err_ini.clear_dev_hw_err_status = sec_clear_hw_err_status; qm->err_ini.err_info.ecc_2bits_mask = SEC_CORE_INT_STATUS_M_ECC; qm->err_ini.err_info.ce = QM_BASE_CE; qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_DO_TASK_TIMEOUT | - QM_ACC_WB_NOT_READY_TIMEOUT; + QM_ACC_WB_NOT_READY_TIMEOUT; qm->err_ini.err_info.fe = 0; qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; qm->err_ini.err_info.acpi_rst = "SRST"; @@ -744,6 +732,24 @@ static int sec_pf_probe_init(struct hisi_qm *qm) qm->err_ini.log_dev_hw_err = sec_log_hw_error; qm->err_ini.open_axi_master_ooo = sec_open_axi_master_ooo; qm->err_ini.err_info.msi_wr_port = SEC_WR_MSI_PORT; +} + +static int sec_pf_probe_init(struct hisi_qm *qm) +{ + int ret; + + switch (qm->ver) { + case QM_HW_V1: + qm->ctrl_q_num = SEC_QUEUE_NUM_V1; + break; + + case QM_HW_V2: + qm->ctrl_q_num = SEC_QUEUE_NUM_V2; + break; + + default: + return -EINVAL; + } ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) @@ -807,6 +813,7 @@ static int sec_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->qm_list = &sec_devices; qm->sqe_size = SEC_SQE_SIZE; qm->dev_name = sec_name; + sec_err_ini_set(qm); return 0; } diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c index 17bbab667553..1ca51793e26a 100644 --- a/drivers/crypto/hisilicon/zip/zip_main.c +++ b/drivers/crypto/hisilicon/zip/zip_main.c @@ -204,7 +204,7 @@ static struct debugfs_reg32 hzip_dfx_regs[] = { {"HZIP_AVG_DELAY ", 0x28ull}, {"HZIP_MEM_VISIBLE_DATA ", 0x30ull}, {"HZIP_MEM_VISIBLE_ADDR ", 0x34ull}, - {"HZIP_COMSUMED_BYTE ", 0x38ull}, + {"HZIP_CONSUMED_BYTE ", 0x38ull}, {"HZIP_PRODUCED_BYTE ", 0x40ull}, {"HZIP_COMP_INF ", 0x70ull}, {"HZIP_PRE_OUT ", 0x78ull}, @@ -755,6 +755,28 @@ static void hisi_zip_close_axi_master_ooo(struct hisi_qm *qm) qm->io_base + HZIP_CORE_INT_SET); } +static void hisi_zip_err_ini_set(struct hisi_qm *qm) +{ + if (qm->fun_type == QM_HW_VF) + return; + + qm->err_ini.get_dev_hw_err_status = hisi_zip_get_hw_err_status; + qm->err_ini.clear_dev_hw_err_status = hisi_zip_clear_hw_err_status; + qm->err_ini.err_info.ecc_2bits_mask = HZIP_CORE_INT_STATUS_M_ECC; + qm->err_ini.err_info.ce = QM_BASE_CE; + qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_WB_NOT_READY_TIMEOUT; + qm->err_ini.err_info.fe = 0; + qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; + qm->err_ini.err_info.acpi_rst = "ZRST"; + qm->err_ini.hw_err_disable = hisi_zip_hw_error_disable; + qm->err_ini.hw_err_enable = hisi_zip_hw_error_enable; + qm->err_ini.set_usr_domain_cache = hisi_zip_set_user_domain_and_cache; + qm->err_ini.log_dev_hw_err = hisi_zip_log_hw_error; + qm->err_ini.open_axi_master_ooo = hisi_zip_open_axi_master_ooo; + qm->err_ini.close_axi_master_ooo = hisi_zip_close_axi_master_ooo; + qm->err_ini.err_info.msi_wr_port = HZIP_WR_PORT; +} + static int hisi_zip_pf_probe_init(struct hisi_qm *qm) { struct hisi_zip *zip = container_of(qm, struct hisi_zip, qm); @@ -781,23 +803,6 @@ static int hisi_zip_pf_probe_init(struct hisi_qm *qm) return -EINVAL; } - qm->err_ini.get_dev_hw_err_status = hisi_zip_get_hw_err_status; - qm->err_ini.clear_dev_hw_err_status = hisi_zip_clear_hw_err_status; - qm->err_ini.err_info.ecc_2bits_mask = HZIP_CORE_INT_STATUS_M_ECC; - qm->err_ini.err_info.ce = QM_BASE_CE; - qm->err_ini.err_info.nfe = QM_BASE_NFE | QM_ACC_WB_NOT_READY_TIMEOUT; - qm->err_ini.err_info.fe = 0; - qm->err_ini.err_info.msi = QM_DB_RANDOM_INVALID; - qm->err_ini.err_info.acpi_rst = "ZRST"; - qm->err_ini.hw_err_disable = hisi_zip_hw_error_disable; - qm->err_ini.hw_err_enable = hisi_zip_hw_error_enable; - qm->err_ini.set_usr_domain_cache = hisi_zip_set_user_domain_and_cache; - qm->err_ini.log_dev_hw_err = hisi_zip_log_hw_error; - qm->err_ini.open_axi_master_ooo = hisi_zip_open_axi_master_ooo; - qm->err_ini.close_axi_master_ooo = hisi_zip_close_axi_master_ooo; - - qm->err_ini.err_info.msi_wr_port = HZIP_WR_PORT; - ret = qm->err_ini.set_usr_domain_cache(qm); if (ret) return ret; @@ -822,6 +827,7 @@ static int hisi_zip_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) qm->sqe_size = HZIP_SQE_SIZE; qm->dev_name = hisi_zip_name; qm->qm_list = &zip_devices; + hisi_zip_err_ini_set(qm); return 0; } -- 2.30.0

2 1

[PATCH v2 openEuler-23.09 0/7] LoongArch: backport drm and spi driver and some bugfixes
by Hongchen Zhang 20 Sep '23

20 Sep '23

Backport the following patches from upstream. Hongchen Zhang (7): LoongArch: Allow usage of LSX/LASX in the kernel spi: loongson: add bus driver for the loongson spi controller drm: Add kms driver for loongson display controller drm/loongson: Remove a useless check in cursor_plane_atomic_async_check() drm/loongson: Add a check for lsdc_bo_create() errors LoongArch: mm: Add p?d_leaf() definitions LoongArch: Fix module relocation error with binutils 2.41 MAINTAINERS | 4 + arch/loongarch/Makefile | 2 + arch/loongarch/include/asm/pgtable.h | 3 + arch/loongarch/kernel/kfpu.c | 55 +- drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/loongson/Kconfig | 17 + drivers/gpu/drm/loongson/Makefile | 22 + drivers/gpu/drm/loongson/loongson_device.c | 102 ++ drivers/gpu/drm/loongson/loongson_module.c | 33 + drivers/gpu/drm/loongson/loongson_module.h | 12 + drivers/gpu/drm/loongson/lsdc_benchmark.c | 133 +++ drivers/gpu/drm/loongson/lsdc_benchmark.h | 13 + drivers/gpu/drm/loongson/lsdc_crtc.c | 1024 +++++++++++++++++ drivers/gpu/drm/loongson/lsdc_debugfs.c | 110 ++ drivers/gpu/drm/loongson/lsdc_drv.c | 457 ++++++++ drivers/gpu/drm/loongson/lsdc_drv.h | 388 +++++++ drivers/gpu/drm/loongson/lsdc_gem.c | 311 +++++ drivers/gpu/drm/loongson/lsdc_gem.h | 37 + drivers/gpu/drm/loongson/lsdc_gfxpll.c | 199 ++++ drivers/gpu/drm/loongson/lsdc_gfxpll.h | 52 + drivers/gpu/drm/loongson/lsdc_i2c.c | 179 +++ drivers/gpu/drm/loongson/lsdc_i2c.h | 29 + drivers/gpu/drm/loongson/lsdc_irq.c | 74 ++ drivers/gpu/drm/loongson/lsdc_irq.h | 16 + drivers/gpu/drm/loongson/lsdc_output.h | 21 + drivers/gpu/drm/loongson/lsdc_output_7a1000.c | 178 +++ drivers/gpu/drm/loongson/lsdc_output_7a2000.c | 552 +++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.c | 481 ++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.h | 86 ++ drivers/gpu/drm/loongson/lsdc_plane.c | 793 +++++++++++++ drivers/gpu/drm/loongson/lsdc_probe.c | 56 + drivers/gpu/drm/loongson/lsdc_probe.h | 12 + drivers/gpu/drm/loongson/lsdc_regs.h | 406 +++++++ drivers/gpu/drm/loongson/lsdc_ttm.c | 593 ++++++++++ drivers/gpu/drm/loongson/lsdc_ttm.h | 99 ++ drivers/spi/Kconfig | 26 + drivers/spi/Makefile | 3 + drivers/spi/spi-loongson-core.c | 279 +++++ drivers/spi/spi-loongson-pci.c | 55 + drivers/spi/spi-loongson-plat.c | 47 + drivers/spi/spi-loongson.h | 49 + 42 files changed, 7007 insertions(+), 4 deletions(-) create mode 100644 drivers/gpu/drm/loongson/Kconfig create mode 100644 drivers/gpu/drm/loongson/Makefile create mode 100644 drivers/gpu/drm/loongson/loongson_device.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.h create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.c create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.h create mode 100644 drivers/gpu/drm/loongson/lsdc_crtc.c create mode 100644 drivers/gpu/drm/loongson/lsdc_debugfs.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.c create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.h create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.c create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a1000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a2000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_plane.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.h create mode 100644 drivers/gpu/drm/loongson/lsdc_regs.h create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.c create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.h create mode 100644 drivers/spi/spi-loongson-core.c create mode 100644 drivers/spi/spi-loongson-pci.c create mode 100644 drivers/spi/spi-loongson-plat.c create mode 100644 drivers/spi/spi-loongson.h -- 2.33.0

2 8

[PATCH openEuler-23.09 0/7] LoongArch: backport drm and spi driver and some bugfixes
by Hongchen Zhang 20 Sep '23

20 Sep '23

Backport the following patches from upstream. Dan Carpenter (1): drm/loongson: Add a check for lsdc_bo_create() errors Hongchen Zhang (1): LoongArch: mm: Add p?d_leaf() definitions Huacai Chen (2): LoongArch: Allow usage of LSX/LASX in the kernel LoongArch: Fix module relocation error with binutils 2.41 Sui Jingfeng (2): drm: Add kms driver for loongson display controller drm/loongson: Remove a useless check in cursor_plane_atomic_async_check() Yinbo Zhu (1): spi: loongson: add bus driver for the loongson spi controller MAINTAINERS | 4 + arch/loongarch/Makefile | 2 + arch/loongarch/include/asm/pgtable.h | 3 + arch/loongarch/kernel/kfpu.c | 55 +- drivers/gpio/gpio-loongson.c | 413 +++++-- drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/loongson/Kconfig | 17 + drivers/gpu/drm/loongson/Makefile | 22 + drivers/gpu/drm/loongson/loongson_device.c | 102 ++ drivers/gpu/drm/loongson/loongson_module.c | 33 + drivers/gpu/drm/loongson/loongson_module.h | 12 + drivers/gpu/drm/loongson/lsdc_benchmark.c | 133 +++ drivers/gpu/drm/loongson/lsdc_benchmark.h | 13 + drivers/gpu/drm/loongson/lsdc_crtc.c | 1024 +++++++++++++++++ drivers/gpu/drm/loongson/lsdc_debugfs.c | 110 ++ drivers/gpu/drm/loongson/lsdc_drv.c | 457 ++++++++ drivers/gpu/drm/loongson/lsdc_drv.h | 388 +++++++ drivers/gpu/drm/loongson/lsdc_gem.c | 311 +++++ drivers/gpu/drm/loongson/lsdc_gem.h | 37 + drivers/gpu/drm/loongson/lsdc_gfxpll.c | 199 ++++ drivers/gpu/drm/loongson/lsdc_gfxpll.h | 52 + drivers/gpu/drm/loongson/lsdc_i2c.c | 179 +++ drivers/gpu/drm/loongson/lsdc_i2c.h | 29 + drivers/gpu/drm/loongson/lsdc_irq.c | 74 ++ drivers/gpu/drm/loongson/lsdc_irq.h | 16 + drivers/gpu/drm/loongson/lsdc_output.h | 21 + drivers/gpu/drm/loongson/lsdc_output_7a1000.c | 178 +++ drivers/gpu/drm/loongson/lsdc_output_7a2000.c | 552 +++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.c | 481 ++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.h | 86 ++ drivers/gpu/drm/loongson/lsdc_plane.c | 793 +++++++++++++ drivers/gpu/drm/loongson/lsdc_probe.c | 56 + drivers/gpu/drm/loongson/lsdc_probe.h | 12 + drivers/gpu/drm/loongson/lsdc_regs.h | 406 +++++++ drivers/gpu/drm/loongson/lsdc_ttm.c | 593 ++++++++++ drivers/gpu/drm/loongson/lsdc_ttm.h | 99 ++ drivers/spi/Kconfig | 26 + drivers/spi/Makefile | 3 + drivers/spi/spi-loongson-core.c | 279 +++++ drivers/spi/spi-loongson-pci.c | 55 + drivers/spi/spi-loongson-plat.c | 47 + drivers/spi/spi-loongson.h | 49 + 43 files changed, 7345 insertions(+), 79 deletions(-) create mode 100644 drivers/gpu/drm/loongson/Kconfig create mode 100644 drivers/gpu/drm/loongson/Makefile create mode 100644 drivers/gpu/drm/loongson/loongson_device.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.h create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.c create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.h create mode 100644 drivers/gpu/drm/loongson/lsdc_crtc.c create mode 100644 drivers/gpu/drm/loongson/lsdc_debugfs.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.c create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.h create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.c create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a1000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a2000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_plane.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.h create mode 100644 drivers/gpu/drm/loongson/lsdc_regs.h create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.c create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.h create mode 100644 drivers/spi/spi-loongson-core.c create mode 100644 drivers/spi/spi-loongson-pci.c create mode 100644 drivers/spi/spi-loongson-plat.c create mode 100644 drivers/spi/spi-loongson.h -- 2.33.0

2 8

[openEuler-23.09 0/7] LoongArch: backport drm and spi driver and some bugfixes
by Hongchen Zhang 20 Sep '23

20 Sep '23

Backport the following patches from upstream. Dan Carpenter (1): drm/loongson: Add a check for lsdc_bo_create() errors Hongchen Zhang (1): LoongArch: mm: Add p?d_leaf() definitions Huacai Chen (2): LoongArch: Allow usage of LSX/LASX in the kernel LoongArch: Fix module relocation error with binutils 2.41 Sui Jingfeng (2): drm: Add kms driver for loongson display controller drm/loongson: Remove a useless check in cursor_plane_atomic_async_check() Yinbo Zhu (1): spi: loongson: add bus driver for the loongson spi controller MAINTAINERS | 4 + arch/loongarch/Makefile | 2 + arch/loongarch/include/asm/pgtable.h | 3 + arch/loongarch/kernel/kfpu.c | 55 +- drivers/gpio/gpio-loongson.c | 413 +++++-- drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/loongson/Kconfig | 17 + drivers/gpu/drm/loongson/Makefile | 22 + drivers/gpu/drm/loongson/loongson_device.c | 102 ++ drivers/gpu/drm/loongson/loongson_module.c | 33 + drivers/gpu/drm/loongson/loongson_module.h | 12 + drivers/gpu/drm/loongson/lsdc_benchmark.c | 133 +++ drivers/gpu/drm/loongson/lsdc_benchmark.h | 13 + drivers/gpu/drm/loongson/lsdc_crtc.c | 1024 +++++++++++++++++ drivers/gpu/drm/loongson/lsdc_debugfs.c | 110 ++ drivers/gpu/drm/loongson/lsdc_drv.c | 457 ++++++++ drivers/gpu/drm/loongson/lsdc_drv.h | 388 +++++++ drivers/gpu/drm/loongson/lsdc_gem.c | 311 +++++ drivers/gpu/drm/loongson/lsdc_gem.h | 37 + drivers/gpu/drm/loongson/lsdc_gfxpll.c | 199 ++++ drivers/gpu/drm/loongson/lsdc_gfxpll.h | 52 + drivers/gpu/drm/loongson/lsdc_i2c.c | 179 +++ drivers/gpu/drm/loongson/lsdc_i2c.h | 29 + drivers/gpu/drm/loongson/lsdc_irq.c | 74 ++ drivers/gpu/drm/loongson/lsdc_irq.h | 16 + drivers/gpu/drm/loongson/lsdc_output.h | 21 + drivers/gpu/drm/loongson/lsdc_output_7a1000.c | 178 +++ drivers/gpu/drm/loongson/lsdc_output_7a2000.c | 552 +++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.c | 481 ++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.h | 86 ++ drivers/gpu/drm/loongson/lsdc_plane.c | 793 +++++++++++++ drivers/gpu/drm/loongson/lsdc_probe.c | 56 + drivers/gpu/drm/loongson/lsdc_probe.h | 12 + drivers/gpu/drm/loongson/lsdc_regs.h | 406 +++++++ drivers/gpu/drm/loongson/lsdc_ttm.c | 593 ++++++++++ drivers/gpu/drm/loongson/lsdc_ttm.h | 99 ++ drivers/spi/Kconfig | 26 + drivers/spi/Makefile | 3 + drivers/spi/spi-loongson-core.c | 279 +++++ drivers/spi/spi-loongson-pci.c | 55 + drivers/spi/spi-loongson-plat.c | 47 + drivers/spi/spi-loongson.h | 49 + 43 files changed, 7345 insertions(+), 79 deletions(-) create mode 100644 drivers/gpu/drm/loongson/Kconfig create mode 100644 drivers/gpu/drm/loongson/Makefile create mode 100644 drivers/gpu/drm/loongson/loongson_device.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.h create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.c create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.h create mode 100644 drivers/gpu/drm/loongson/lsdc_crtc.c create mode 100644 drivers/gpu/drm/loongson/lsdc_debugfs.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.c create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.h create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.c create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a1000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a2000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_plane.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.h create mode 100644 drivers/gpu/drm/loongson/lsdc_regs.h create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.c create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.h create mode 100644 drivers/spi/spi-loongson-core.c create mode 100644 drivers/spi/spi-loongson-pci.c create mode 100644 drivers/spi/spi-loongson-plat.c create mode 100644 drivers/spi/spi-loongson.h -- 2.33.0

2 9

openEuler Kernel SIG双周例会
by openEuler conference 20 Sep '23

20 Sep '23

您好！ Kernel SIG 邀请您参加 2023-09-22 14:00 召开的Zoom会议(自动录制) 会议主题：openEuler Kernel SIG双周例会会议内容： 1. 进展update 2. 议题征集中新增议题可直接回复邮件申请，或录入会议看板会议链接：https://us06web.zoom.us/j/83542407044?pwd=UYtASnHgeP3bOAEaO9OCyMaPdQc6iA.1 会议纪要：https://etherpad.openeuler.org/p/Kernel-meetings 温馨提醒：建议接入会议后修改参会人的姓名，也可以使用您在gitee.com的ID 更多资讯尽在：https://openeuler.org/zh/ Hello! openEuler Kernel SIG invites you to attend the Zoom conference(auto recording) will be held at 2023-09-22 14:00, The subject of the conference is openEuler Kernel SIG双周例会, Summary: 1. 进展update 2. 议题征集中新增议题可直接回复邮件申请，或录入会议看板 You can join the meeting at https://us06web.zoom.us/j/83542407044?pwd=UYtASnHgeP3bOAEaO9OCyMaPdQc6iA.1. Add topics at https://etherpad.openeuler.org/p/Kernel-meetings. Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com. More information: https://openeuler.org/en/

1 0

[openEuler-23.09 0/7] LoongArch: backport drm and spi driver and some bugfixes
by Hongchen Zhang 20 Sep '23

20 Sep '23

Backport the following patches from upstream. Dan Carpenter (1): drm/loongson: Add a check for lsdc_bo_create() errors Hongchen Zhang (1): LoongArch: mm: Add p?d_leaf() definitions Huacai Chen (2): LoongArch: Allow usage of LSX/LASX in the kernel LoongArch: Fix module relocation error with binutils 2.41 Sui Jingfeng (2): drm: Add kms driver for loongson display controller drm/loongson: Remove a useless check in cursor_plane_atomic_async_check() Yinbo Zhu (1): spi: loongson: add bus driver for the loongson spi controller MAINTAINERS | 4 + arch/loongarch/Makefile | 2 + arch/loongarch/include/asm/pgtable.h | 3 + arch/loongarch/kernel/kfpu.c | 55 +- drivers/gpio/gpio-loongson.c | 413 +++++-- drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/loongson/Kconfig | 17 + drivers/gpu/drm/loongson/Makefile | 22 + drivers/gpu/drm/loongson/loongson_device.c | 102 ++ drivers/gpu/drm/loongson/loongson_module.c | 33 + drivers/gpu/drm/loongson/loongson_module.h | 12 + drivers/gpu/drm/loongson/lsdc_benchmark.c | 133 +++ drivers/gpu/drm/loongson/lsdc_benchmark.h | 13 + drivers/gpu/drm/loongson/lsdc_crtc.c | 1024 +++++++++++++++++ drivers/gpu/drm/loongson/lsdc_debugfs.c | 110 ++ drivers/gpu/drm/loongson/lsdc_drv.c | 457 ++++++++ drivers/gpu/drm/loongson/lsdc_drv.h | 388 +++++++ drivers/gpu/drm/loongson/lsdc_gem.c | 311 +++++ drivers/gpu/drm/loongson/lsdc_gem.h | 37 + drivers/gpu/drm/loongson/lsdc_gfxpll.c | 199 ++++ drivers/gpu/drm/loongson/lsdc_gfxpll.h | 52 + drivers/gpu/drm/loongson/lsdc_i2c.c | 179 +++ drivers/gpu/drm/loongson/lsdc_i2c.h | 29 + drivers/gpu/drm/loongson/lsdc_irq.c | 74 ++ drivers/gpu/drm/loongson/lsdc_irq.h | 16 + drivers/gpu/drm/loongson/lsdc_output.h | 21 + drivers/gpu/drm/loongson/lsdc_output_7a1000.c | 178 +++ drivers/gpu/drm/loongson/lsdc_output_7a2000.c | 552 +++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.c | 481 ++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.h | 86 ++ drivers/gpu/drm/loongson/lsdc_plane.c | 793 +++++++++++++ drivers/gpu/drm/loongson/lsdc_probe.c | 56 + drivers/gpu/drm/loongson/lsdc_probe.h | 12 + drivers/gpu/drm/loongson/lsdc_regs.h | 406 +++++++ drivers/gpu/drm/loongson/lsdc_ttm.c | 593 ++++++++++ drivers/gpu/drm/loongson/lsdc_ttm.h | 99 ++ drivers/spi/Kconfig | 26 + drivers/spi/Makefile | 3 + drivers/spi/spi-loongson-core.c | 279 +++++ drivers/spi/spi-loongson-pci.c | 55 + drivers/spi/spi-loongson-plat.c | 47 + drivers/spi/spi-loongson.h | 49 + 43 files changed, 7345 insertions(+), 79 deletions(-) create mode 100644 drivers/gpu/drm/loongson/Kconfig create mode 100644 drivers/gpu/drm/loongson/Makefile create mode 100644 drivers/gpu/drm/loongson/loongson_device.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.h create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.c create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.h create mode 100644 drivers/gpu/drm/loongson/lsdc_crtc.c create mode 100644 drivers/gpu/drm/loongson/lsdc_debugfs.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.c create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.h create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.c create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a1000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a2000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_plane.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.h create mode 100644 drivers/gpu/drm/loongson/lsdc_regs.h create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.c create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.h create mode 100644 drivers/spi/spi-loongson-core.c create mode 100644 drivers/spi/spi-loongson-pci.c create mode 100644 drivers/spi/spi-loongson-plat.c create mode 100644 drivers/spi/spi-loongson.h -- 2.33.0

1 7

[PATCH v4 openEuler-23.09 0/3] remote_pager: fix msg_handler_peer.c build failed
by Wupeng Ma 20 Sep '23

20 Sep '23

From: Ma Wupeng <mawupeng1(a)huawei.com> remote_pager: fix msg_handler_peer.c build failed. Chunsheng Luo (3): mmap: export __do_mmap_mm symbol remote_pager: fix msg_handler_peer.c build failed remote_pager: delete unused file drivers/remote_pager/Kconfig | 9 + drivers/remote_pager/Makefile | 1 + drivers/remote_pager/main.c | 7 - drivers/remote_pager/msg_handler_peer.c | 111 ++------ drivers/remote_pager/swap/device/ksymbol.c | 83 ------ drivers/remote_pager/swap/device/ksymbol.h | 35 --- .../remote_pager/swap/device/swap_manager.c | 256 ------------------ .../remote_pager/swap/device/swap_manager.h | 28 -- .../swap/device/swap_policy/policy_list_lru.c | 108 -------- .../swap/device/swap_policy/swap_policy.h | 16 -- mm/mmap.c | 1 + 11 files changed, 33 insertions(+), 622 deletions(-) delete mode 100644 drivers/remote_pager/swap/device/ksymbol.c delete mode 100644 drivers/remote_pager/swap/device/ksymbol.h delete mode 100644 drivers/remote_pager/swap/device/swap_manager.c delete mode 100644 drivers/remote_pager/swap/device/swap_manager.h delete mode 100644 drivers/remote_pager/swap/device/swap_policy/policy_list_lru.c delete mode 100644 drivers/remote_pager/swap/device/swap_policy/swap_policy.h -- 2.25.1

2 4

[openEuler-23.09 0/7] LoongArch: backport drm and spi driver and some bugfixes
by Hongchen Zhang 20 Sep '23

20 Sep '23

Backport the following patches from upstream. Dan Carpenter (1): drm/loongson: Add a check for lsdc_bo_create() errors Hongchen Zhang (1): LoongArch: mm: Add p?d_leaf() definitions Huacai Chen (2): LoongArch: Allow usage of LSX/LASX in the kernel LoongArch: Fix module relocation error with binutils 2.41 Sui Jingfeng (2): drm: Add kms driver for loongson display controller drm/loongson: Remove a useless check in cursor_plane_atomic_async_check() Yinbo Zhu (1): spi: loongson: add bus driver for the loongson spi controller MAINTAINERS | 4 + arch/loongarch/Makefile | 2 + arch/loongarch/include/asm/pgtable.h | 3 + arch/loongarch/kernel/kfpu.c | 55 +- drivers/gpio/gpio-loongson.c | 413 +++++-- drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/loongson/Kconfig | 17 + drivers/gpu/drm/loongson/Makefile | 22 + drivers/gpu/drm/loongson/loongson_device.c | 102 ++ drivers/gpu/drm/loongson/loongson_module.c | 33 + drivers/gpu/drm/loongson/loongson_module.h | 12 + drivers/gpu/drm/loongson/lsdc_benchmark.c | 133 +++ drivers/gpu/drm/loongson/lsdc_benchmark.h | 13 + drivers/gpu/drm/loongson/lsdc_crtc.c | 1024 +++++++++++++++++ drivers/gpu/drm/loongson/lsdc_debugfs.c | 110 ++ drivers/gpu/drm/loongson/lsdc_drv.c | 457 ++++++++ drivers/gpu/drm/loongson/lsdc_drv.h | 388 +++++++ drivers/gpu/drm/loongson/lsdc_gem.c | 311 +++++ drivers/gpu/drm/loongson/lsdc_gem.h | 37 + drivers/gpu/drm/loongson/lsdc_gfxpll.c | 199 ++++ drivers/gpu/drm/loongson/lsdc_gfxpll.h | 52 + drivers/gpu/drm/loongson/lsdc_i2c.c | 179 +++ drivers/gpu/drm/loongson/lsdc_i2c.h | 29 + drivers/gpu/drm/loongson/lsdc_irq.c | 74 ++ drivers/gpu/drm/loongson/lsdc_irq.h | 16 + drivers/gpu/drm/loongson/lsdc_output.h | 21 + drivers/gpu/drm/loongson/lsdc_output_7a1000.c | 178 +++ drivers/gpu/drm/loongson/lsdc_output_7a2000.c | 552 +++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.c | 481 ++++++++ drivers/gpu/drm/loongson/lsdc_pixpll.h | 86 ++ drivers/gpu/drm/loongson/lsdc_plane.c | 793 +++++++++++++ drivers/gpu/drm/loongson/lsdc_probe.c | 56 + drivers/gpu/drm/loongson/lsdc_probe.h | 12 + drivers/gpu/drm/loongson/lsdc_regs.h | 406 +++++++ drivers/gpu/drm/loongson/lsdc_ttm.c | 593 ++++++++++ drivers/gpu/drm/loongson/lsdc_ttm.h | 99 ++ drivers/spi/Kconfig | 26 + drivers/spi/Makefile | 3 + drivers/spi/spi-loongson-core.c | 279 +++++ drivers/spi/spi-loongson-pci.c | 55 + drivers/spi/spi-loongson-plat.c | 47 + drivers/spi/spi-loongson.h | 49 + 43 files changed, 7345 insertions(+), 79 deletions(-) create mode 100644 drivers/gpu/drm/loongson/Kconfig create mode 100644 drivers/gpu/drm/loongson/Makefile create mode 100644 drivers/gpu/drm/loongson/loongson_device.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.c create mode 100644 drivers/gpu/drm/loongson/loongson_module.h create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.c create mode 100644 drivers/gpu/drm/loongson/lsdc_benchmark.h create mode 100644 drivers/gpu/drm/loongson/lsdc_crtc.c create mode 100644 drivers/gpu/drm/loongson/lsdc_debugfs.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.c create mode 100644 drivers/gpu/drm/loongson/lsdc_drv.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gem.h create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_gfxpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.c create mode 100644 drivers/gpu/drm/loongson/lsdc_i2c.h create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.c create mode 100644 drivers/gpu/drm/loongson/lsdc_irq.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output.h create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a1000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_output_7a2000.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.c create mode 100644 drivers/gpu/drm/loongson/lsdc_pixpll.h create mode 100644 drivers/gpu/drm/loongson/lsdc_plane.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.c create mode 100644 drivers/gpu/drm/loongson/lsdc_probe.h create mode 100644 drivers/gpu/drm/loongson/lsdc_regs.h create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.c create mode 100644 drivers/gpu/drm/loongson/lsdc_ttm.h create mode 100644 drivers/spi/spi-loongson-core.c create mode 100644 drivers/spi/spi-loongson-pci.c create mode 100644 drivers/spi/spi-loongson-plat.c create mode 100644 drivers/spi/spi-loongson.h -- 2.33.0

1 2

[PATCH v3 openEuler-23.09 0/3] remote_pager: fix msg_handler_peer.c build failed
by Wupeng Ma 20 Sep '23

20 Sep '23

From: Ma Wupeng <mawupeng1(a)huawei.com> remote_pager: fix msg_handler_peer.c build failed. Chunsheng Luo (3): mmap: export __do_mmap_mm symbol remote_pager: fix msg_handler_peer.c build failed remote_pager: delete unused file drivers/remote_pager/Kconfig | 8 + drivers/remote_pager/Makefile | 1 + drivers/remote_pager/main.c | 7 - drivers/remote_pager/msg_handler_peer.c | 111 ++------ drivers/remote_pager/swap/device/ksymbol.c | 83 ------ drivers/remote_pager/swap/device/ksymbol.h | 35 --- .../remote_pager/swap/device/swap_manager.c | 256 ------------------ .../remote_pager/swap/device/swap_manager.h | 28 -- .../swap/device/swap_policy/policy_list_lru.c | 108 -------- .../swap/device/swap_policy/swap_policy.h | 16 -- mm/mmap.c | 1 + 11 files changed, 32 insertions(+), 622 deletions(-) delete mode 100644 drivers/remote_pager/swap/device/ksymbol.c delete mode 100644 drivers/remote_pager/swap/device/ksymbol.h delete mode 100644 drivers/remote_pager/swap/device/swap_manager.c delete mode 100644 drivers/remote_pager/swap/device/swap_manager.h delete mode 100644 drivers/remote_pager/swap/device/swap_policy/policy_list_lru.c delete mode 100644 drivers/remote_pager/swap/device/swap_policy/swap_policy.h -- 2.25.1

2 4

[PATCH openEuler-22.03-LTS 0/5] x86/speculation: Add force option to GDS mitigation
by Zeng Heng 19 Sep '23

19 Sep '23

Arnd Bergmann (1): x86: Move gds_ucode_mitigated() declaration to header Daniel Sneddon (3): x86/speculation: Add force option to GDS mitigation x86/speculation: Add Kconfig option for GDS KVM: Add GDS_NO support to KVM Dave Hansen (1): Documentation/x86: Fix backwards on/off logic about YMM support .../hw-vuln/gather_data_sampling.rst | 18 ++++++++--- .../admin-guide/kernel-parameters.txt | 8 ++++- arch/x86/Kconfig | 19 ++++++++++++ arch/x86/include/asm/processor.h | 2 ++ arch/x86/kernel/cpu/bugs.c | 31 ++++++++++++++++++- arch/x86/kvm/x86.c | 3 ++ 6 files changed, 75 insertions(+), 6 deletions(-) -- 2.25.1

2 6

[PATCH openEuler-23.09 v1] sch_netem: fix issues in netem_change() vs get_dist_table()
by Yue Haibing 19 Sep '23

19 Sep '23

From: Eric Dumazet <edumazet(a)google.com> mainline inclusion from mainline-v6.5-rc1 commit 11b73313c12403f617b47752db0ab3deef201af7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I80FL9 CVE: NA -------------------------------- In blamed commit, I missed that get_dist_table() was allocating memory using GFP_KERNEL, and acquiring qdisc lock to perform the swap of newly allocated table with current one. In this patch, get_dist_table() is allocating memory and copy user data before we acquire the qdisc lock. Then we perform swap operations while being protected by the lock. Note that after this patch netem_change() no longer can do partial changes. If an error is returned, qdisc conf is left unchanged. Fixes: 2174a08db80d ("sch_netem: acquire qdisc lock in netem_change()") Reported-by: syzbot <syzkaller(a)googlegroups.com> Signed-off-by: Eric Dumazet <edumazet(a)google.com> Cc: Stephen Hemminger <stephen(a)networkplumber.org> Acked-by: Jamal Hadi Salim <jhs(a)mojatatu.com> Reviewed-by: Simon Horman <simon.horman(a)corigine.com> Link: https://lore.kernel.org/r/20230622181503.2327695-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Yue Haibing <yuehaibing(a)huawei.com> --- net/sched/sch_netem.c | 59 ++++++++++++++++++------------------------- 1 file changed, 25 insertions(+), 34 deletions(-) diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index e79be1b3e74d..b93ec2a3454e 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -773,12 +773,10 @@ static void dist_free(struct disttable *d) * signed 16 bit values. */ -static int get_dist_table(struct Qdisc *sch, struct disttable **tbl, - const struct nlattr *attr) +static int get_dist_table(struct disttable **tbl, const struct nlattr *attr) { size_t n = nla_len(attr)/sizeof(__s16); const __s16 *data = nla_data(attr); - spinlock_t *root_lock; struct disttable *d; int i; @@ -793,13 +791,7 @@ static int get_dist_table(struct Qdisc *sch, struct disttable **tbl, for (i = 0; i < n; i++) d->table[i] = data[i]; - root_lock = qdisc_root_sleeping_lock(sch); - - spin_lock_bh(root_lock); - swap(*tbl, d); - spin_unlock_bh(root_lock); - - dist_free(d); + *tbl = d; return 0; } @@ -956,6 +948,8 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, { struct netem_sched_data *q = qdisc_priv(sch); struct nlattr *tb[TCA_NETEM_MAX + 1]; + struct disttable *delay_dist = NULL; + struct disttable *slot_dist = NULL; struct tc_netem_qopt *qopt; struct clgstate old_clg; int old_loss_model = CLG_RANDOM; @@ -966,6 +960,18 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, if (ret < 0) return ret; + if (tb[TCA_NETEM_DELAY_DIST]) { + ret = get_dist_table(&delay_dist, tb[TCA_NETEM_DELAY_DIST]); + if (ret) + goto table_free; + } + + if (tb[TCA_NETEM_SLOT_DIST]) { + ret = get_dist_table(&slot_dist, tb[TCA_NETEM_SLOT_DIST]); + if (ret) + goto table_free; + } + sch_tree_lock(sch); /* backup q->clg and q->loss_model */ old_clg = q->clg; @@ -975,26 +981,17 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, ret = get_loss_clg(q, tb[TCA_NETEM_LOSS]); if (ret) { q->loss_model = old_loss_model; + q->clg = old_clg; goto unlock; } } else { q->loss_model = CLG_RANDOM; } - if (tb[TCA_NETEM_DELAY_DIST]) { - ret = get_dist_table(sch, &q->delay_dist, - tb[TCA_NETEM_DELAY_DIST]); - if (ret) - goto get_table_failure; - } - - if (tb[TCA_NETEM_SLOT_DIST]) { - ret = get_dist_table(sch, &q->slot_dist, - tb[TCA_NETEM_SLOT_DIST]); - if (ret) - goto get_table_failure; - } - + if (delay_dist) + swap(q->delay_dist, delay_dist); + if (slot_dist) + swap(q->slot_dist, slot_dist); sch->limit = qopt->limit; q->latency = PSCHED_TICKS2NS(qopt->latency); @@ -1044,17 +1041,11 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, unlock: sch_tree_unlock(sch); - return ret; -get_table_failure: - /* recover clg and loss_model, in case of - * q->clg and q->loss_model were modified - * in get_loss_clg() - */ - q->clg = old_clg; - q->loss_model = old_loss_model; - - goto unlock; +table_free: + dist_free(delay_dist); + dist_free(slot_dist); + return ret; } static int netem_init(struct Qdisc *sch, struct nlattr *opt, -- 2.34.1

2 1

[PATCH OLK-5.10] [Backport] media: ttusb-dec: fix memory leak in ttusb_dec_exit_dvb()
by ChenXiaoSong 19 Sep '23

19 Sep '23

From: Hyunwoo Kim <imv4bel(a)gmail.com> stable inclusion from stable-v5.10.183 commit eb37fef417a246fe54530901a3ea9c0abc914fc2 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I635HP CVE: CVE-2022-45887 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 517a281338322ff8293f988771c98aaa7205e457 ] Since dvb_frontend_detach() is not called in ttusb_dec_exit_dvb(), which is called when the device is disconnected, dvb_frontend_free() is not finally called. This causes a memory leak just by repeatedly plugging and unplugging the device. Fix this issue by adding dvb_frontend_detach() to ttusb_dec_exit_dvb(). Link: https://lore.kernel.org/linux-media/20221117045925.14297-5-imv4bel@gmail.com Signed-off-by: Hyunwoo Kim <imv4bel(a)gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com> --- drivers/media/usb/ttusb-dec/ttusb_dec.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/media/usb/ttusb-dec/ttusb_dec.c b/drivers/media/usb/ttusb-dec/ttusb_dec.c index df6c5e4a0f05..68f88143c8a6 100644 --- a/drivers/media/usb/ttusb-dec/ttusb_dec.c +++ b/drivers/media/usb/ttusb-dec/ttusb_dec.c @@ -1551,8 +1551,7 @@ static void ttusb_dec_exit_dvb(struct ttusb_dec *dec) dvb_dmx_release(&dec->demux); if (dec->fe) { dvb_unregister_frontend(dec->fe); - if (dec->fe->ops.release) - dec->fe->ops.release(dec->fe); + dvb_frontend_detach(dec->fe); } dvb_unregister_adapter(&dec->adapter); } -- 2.31.1

2 1

[PATCH openEuler-23.09 0/2] xfs: fix NULL dereference in xlog_cil_pcp_dead
by Baokun Li 19 Sep '23

19 Sep '23

Darrick J. Wong (2): xfs: fix per-cpu CIL structure aggregation racing with dying cpus xfs: use per-mount cpumask to track nonempty percpu inodegc lists fs/xfs/xfs_icache.c | 78 ++++++++++++++++--------------------------- fs/xfs/xfs_icache.h | 1 - fs/xfs/xfs_log_cil.c | 52 +++++++++-------------------- fs/xfs/xfs_log_priv.h | 14 ++++---- fs/xfs/xfs_mount.h | 6 ++-- fs/xfs/xfs_super.c | 5 +-- 6 files changed, 55 insertions(+), 101 deletions(-) -- 2.31.1

2 3

[PATCH v3 openEuler-23.09 0/2] remote_pager: fix msg_handler_peer.c build failed
by Wupeng Ma 19 Sep '23

19 Sep '23

From: Ma Wupeng <mawupeng1(a)huawei.com> remote_pager: fix msg_handler_peer.c build failed. Chunsheng Luo (2): mmap: export __do_mmap_mm symbol remote_pager: fix msg_handler_peer.c build failed drivers/remote_pager/Kconfig | 9 ++ drivers/remote_pager/Makefile | 1 + drivers/remote_pager/main.c | 7 - drivers/remote_pager/msg_handler_peer.c | 197 +++++++++++++----------- mm/mmap.c | 1 + 5 files changed, 117 insertions(+), 98 deletions(-) -- 2.25.1

2 3

[PATCH v2 openEuler-23.09 0/2] remote_pager: fix msg_handler_peer.c build failed
by Wupeng Ma 19 Sep '23

19 Sep '23

From: Ma Wupeng <mawupeng1(a)huawei.com> remote_pager: fix msg_handler_peer.c build failed. Chunsheng Luo (2): mmap: export __do_mmap_mm symbol remote_pager: fix msg_handler_peer.c build failed drivers/remote_pager/Kconfig | 9 ++ drivers/remote_pager/Makefile | 1 + drivers/remote_pager/main.c | 7 - drivers/remote_pager/msg_handler_peer.c | 197 +++++++++++++----------- mm/mmap.c | 1 + 5 files changed, 117 insertions(+), 98 deletions(-) -- 2.25.1

2 3

[PATCH openEuler-1.0-LTS] crypto: hisilicon/qm - prevent soft lockup in qm_poll_qp()'s loop
by w00416078 19 Sep '23

19 Sep '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I82LC6 CVE: NA -------------------------------- The function qm_poll_qp() may take a while due to complex req_cb, so soft lockup may occur in kernel with preemption disabled. Add a cond_resched() to prevent that. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> --- drivers/crypto/hisilicon/qm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index e0fd83465dce..739b1a6565fd 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -540,6 +540,8 @@ static void qm_poll_qp(struct hisi_qp *qp, struct hisi_qm *qm) qm_db(qm, qp->qp_id, QM_DOORBELL_CMD_CQ, qp->qp_status.cq_head, 0); atomic_dec(&qp->qp_status.used); + + cond_resched(); } /* set c_flag */ qm_db(qm, qp->qp_id, QM_DOORBELL_CMD_CQ, -- 2.30.0

2 1

[PATCH openEuler-1.0-LTS] media: ttusb-dec: fix memory leak in ttusb_dec_exit_dvb()
by ChenXiaoSong 19 Sep '23

19 Sep '23

From: Hyunwoo Kim <imv4bel(a)gmail.com> stable inclusion from stable-v4.19.285 commit 3e5af0745a4702ab0df2f880bfe0431eb30f9164 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I635HP CVE: CVE-2022-45887 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… -------------------------------- [ Upstream commit 517a281338322ff8293f988771c98aaa7205e457 ] Since dvb_frontend_detach() is not called in ttusb_dec_exit_dvb(), which is called when the device is disconnected, dvb_frontend_free() is not finally called. This causes a memory leak just by repeatedly plugging and unplugging the device. Fix this issue by adding dvb_frontend_detach() to ttusb_dec_exit_dvb(). Link: https://lore.kernel.org/linux-media/20221117045925.14297-5-imv4bel@gmail.com Signed-off-by: Hyunwoo Kim <imv4bel(a)gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com> --- drivers/media/usb/ttusb-dec/ttusb_dec.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/media/usb/ttusb-dec/ttusb_dec.c b/drivers/media/usb/ttusb-dec/ttusb_dec.c index f34efa7c61b4..c915e555897b 100644 --- a/drivers/media/usb/ttusb-dec/ttusb_dec.c +++ b/drivers/media/usb/ttusb-dec/ttusb_dec.c @@ -1561,8 +1561,7 @@ static void ttusb_dec_exit_dvb(struct ttusb_dec *dec) dvb_dmx_release(&dec->demux); if (dec->fe) { dvb_unregister_frontend(dec->fe); - if (dec->fe->ops.release) - dec->fe->ops.release(dec->fe); + dvb_frontend_detach(dec->fe); } dvb_unregister_adapter(&dec->adapter); } -- 2.31.1

2 1

[PATCH OLK-5.10] ext4: fix rec_len verify error
by Baokun Li 19 Sep '23

19 Sep '23

From: Shida Zhang <zhangshida(a)kylinos.cn> mainline inclusion from mainline-v6.6-rc2 commit 7fda67e8c3ab6069f75888f67958a6d30454a9f6 category: bugfix bugzilla: 189039, https://gitee.com/openeuler/kernel/issues/I7OXK8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- With the configuration PAGE_SIZE 64k and filesystem blocksize 64k, a problem occurred when more than 13 million files were directly created under a directory: EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D. EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D. EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: comm xxxxx: Directory index failed checksum When enough files are created, the fake_dirent->reclen will be 0xffff. it doesn't equal to the blocksize 65536, i.e. 0x10000. But it is not the same condition when blocksize equals to 4k. when enough files are created, the fake_dirent->reclen will be 0x1000. it equals to the blocksize 4k, i.e. 0x1000. The problem seems to be related to the limitation of the 16-bit field when the blocksize is set to 64k. To address this, helpers like ext4_rec_len_{from,to}_disk has already been introduced to complete the conversion between the encoded and the plain form of rec_len. So fix this one by using the helper, and all the other in this file too. Cc: stable(a)kernel.org Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Suggested-by: Andreas Dilger <adilger(a)dilger.ca> Suggested-by: Darrick J. Wong <djwong(a)kernel.org> Signed-off-by: Shida Zhang <zhangshida(a)kylinos.cn> Reviewed-by: Andreas Dilger <adilger(a)dilger.ca> Reviewed-by: Darrick J. Wong <djwong(a)kernel.org> Link: https://lore.kernel.org/r/20230803060938.1929759-1-zhangshida@kylinos.cn Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Signed-off-by: Baokun Li <libaokun1(a)huawei.com> --- fs/ext4/namei.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 746aed40a3c8..a01aba88c39d 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -344,17 +344,17 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode, struct buffer_head *bh) { struct ext4_dir_entry_tail *t; + int blocksize = EXT4_BLOCK_SIZE(inode->i_sb); #ifdef PARANOID struct ext4_dir_entry *d, *top; d = (struct ext4_dir_entry *)bh->b_data; top = (struct ext4_dir_entry *)(bh->b_data + - (EXT4_BLOCK_SIZE(inode->i_sb) - - sizeof(struct ext4_dir_entry_tail))); - while (d < top && d->rec_len) + (blocksize - sizeof(struct ext4_dir_entry_tail))); + while (d < top && ext4_rec_len_from_disk(d->rec_len, blocksize)) d = (struct ext4_dir_entry *)(((void *)d) + - le16_to_cpu(d->rec_len)); + ext4_rec_len_from_disk(d->rec_len, blocksize)); if (d != top) return NULL; @@ -365,7 +365,8 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode, #endif if (t->det_reserved_zero1 || - le16_to_cpu(t->det_rec_len) != sizeof(struct ext4_dir_entry_tail) || + (ext4_rec_len_from_disk(t->det_rec_len, blocksize) != + sizeof(struct ext4_dir_entry_tail)) || t->det_reserved_zero2 || t->det_reserved_ft != EXT4_FT_DIR_CSUM) return NULL; @@ -446,13 +447,14 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode, struct ext4_dir_entry *dp; struct dx_root_info *root; int count_offset; + int blocksize = EXT4_BLOCK_SIZE(inode->i_sb); + unsigned int rlen = ext4_rec_len_from_disk(dirent->rec_len, blocksize); - if (le16_to_cpu(dirent->rec_len) == EXT4_BLOCK_SIZE(inode->i_sb)) + if (rlen == blocksize) count_offset = 8; - else if (le16_to_cpu(dirent->rec_len) == 12) { + else if (rlen == 12) { dp = (struct ext4_dir_entry *)(((void *)dirent) + 12); - if (le16_to_cpu(dp->rec_len) != - EXT4_BLOCK_SIZE(inode->i_sb) - 12) + if (ext4_rec_len_from_disk(dp->rec_len, blocksize) != blocksize - 12) return NULL; root = (struct dx_root_info *)(((void *)dp + 12)); if (root->reserved_zero || @@ -1261,6 +1263,7 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh, unsigned int buflen = bh->b_size; char *base = bh->b_data; struct dx_hash_info h = *hinfo; + int blocksize = EXT4_BLOCK_SIZE(dir->i_sb); if (ext4_has_metadata_csum(dir->i_sb)) buflen -= sizeof(struct ext4_dir_entry_tail); @@ -1274,11 +1277,12 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh, map_tail--; map_tail->hash = h.hash; map_tail->offs = ((char *) de - base)>>2; - map_tail->size = le16_to_cpu(de->rec_len); + map_tail->size = ext4_rec_len_from_disk(de->rec_len, + blocksize); count++; cond_resched(); } - de = ext4_next_entry(de, dir->i_sb->s_blocksize); + de = ext4_next_entry(de, blocksize); } return count; } -- 2.31.1

2 1

[PATCH openEuler-23.09 0/2] remote_pager: fix msg_handler_peer.c build failed
by Wupeng Ma 19 Sep '23

19 Sep '23

From: Ma Wupeng <mawupeng1(a)huawei.com> remote_pager: fix msg_handler_peer.c build failed. Chunsheng Luo (2): mmap: export __do_mmap_mm symbol remote_pager: fix msg_handler_peer.c build failed drivers/remote_pager/Kconfig | 9 ++ drivers/remote_pager/Makefile | 1 + drivers/remote_pager/main.c | 7 - drivers/remote_pager/msg_handler_peer.c | 197 +++++++++++++----------- mm/mmap.c | 1 + 5 files changed, 117 insertions(+), 98 deletions(-) -- 2.25.1

2 3

[PATCH OLK-5.10] Add new config 'CONFIG_EXT4_ERROR_REPORT' to control ext3/4 error reporting
by Baokun Li 19 Sep '23

19 Sep '23

From: Zhihao Cheng <chengzhihao1(a)huawei.com> hulk inclusion category: bugfix bugzilla: 187975, https://gitee.com/openeuler/kernel/issues/I7T77P -------------------------------- Add new config 'CONFIG_EXT4_ERROR_REPORT' to control ext3/4 error reporting. Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Signed-off-by: Baokun Li <libaokun1(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + fs/ext4/Kconfig | 8 ++++++++ fs/ext4/ext4.h | 2 ++ fs/ext4/super.c | 19 +++++++++++++++++++ 5 files changed, 31 insertions(+) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index f055d8e93bc4..514d35e099f7 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -6211,6 +6211,7 @@ CONFIG_EXT4_FS=m CONFIG_EXT4_USE_FOR_EXT2=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_EXT4_FS_SECURITY=y +CONFIG_EXT4_ERROR_REPORT=y # CONFIG_EXT4_DEBUG is not set CONFIG_JBD2=m # CONFIG_JBD2_DEBUG is not set diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 9adedd9d615a..203c5e353d94 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -7308,6 +7308,7 @@ CONFIG_EXT4_FS=m CONFIG_EXT4_USE_FOR_EXT2=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_EXT4_FS_SECURITY=y +CONFIG_EXT4_ERROR_REPORT=y # CONFIG_EXT4_DEBUG is not set CONFIG_JBD2=m # CONFIG_JBD2_DEBUG is not set diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig index 86699c8cab28..ae108d47ff00 100644 --- a/fs/ext4/Kconfig +++ b/fs/ext4/Kconfig @@ -117,3 +117,11 @@ config EXT4_KUNIT_TESTS to the KUnit documentation in Documentation/dev-tools/kunit/. If unsure, say N. + +config EXT4_ERROR_REPORT + bool "Ext4 error reporting by netlink" + depends on EXT4_FS && NET + default n + help + Implement the ext3/ext4 file system error report. Report error to + userspace by netlink diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 56776dea9cd1..74f72a22f5ec 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -46,6 +46,7 @@ #include <linux/compiler.h> +#ifdef CONFIG_EXT4_ERROR_REPORT #define NL_EXT4_ERROR_GROUP 1 #define EXT4_ERROR_MAGIC 0xAE32014U struct ext4_err_msg { @@ -54,6 +55,7 @@ struct ext4_err_msg { unsigned long s_flags; int ext4_errno; }; +#endif /* * The fourth extended filesystem constants/structures diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4afc9dab14cf..d24816506339 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -55,9 +55,11 @@ #include "mballoc.h" #include "fsmap.h" +#ifdef CONFIG_EXT4_ERROR_REPORT #include <uapi/linux/netlink.h> #include <net/sock.h> #include <net/net_namespace.h> +#endif #define CREATE_TRACE_POINTS #include <trace/events/ext4.h> @@ -90,8 +92,10 @@ static void ext4_unregister_li_request(struct super_block *sb); static void ext4_clear_request_list(void); static struct inode *ext4_get_journal_inode(struct super_block *sb, unsigned int journal_inum); +#ifdef CONFIG_EXT4_ERROR_REPORT static void ext4_netlink_send_info(struct super_block *sb, int ext4_errno); static struct sock *ext4nl; +#endif /* * Lock ordering @@ -616,6 +620,7 @@ static void save_error_info(struct super_block *sb, int error, spin_unlock(&sbi->s_error_lock); } +#ifdef CONFIG_EXT4_ERROR_REPORT static void ext4_netlink_send_info(struct super_block *sb, int ext4_errno) { int size; @@ -651,6 +656,7 @@ static void ext4_netlink_send_info(struct super_block *sb, int ext4_errno) kfree_skb(skb); } } +#endif /* Deal with the reporting of failure conditions on a filesystem such as * inconsistencies detected or read IO failures. @@ -713,11 +719,16 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, sb->s_id); } +#ifdef CONFIG_EXT4_ERROR_REPORT if (sb_rdonly(sb)) return; if (continue_fs) goto out; +#else + if (sb_rdonly(sb) || continue_fs) + return; +#endif ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only"); @@ -727,8 +738,10 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, */ smp_wmb(); sb->s_flags |= SB_RDONLY; +#ifdef CONFIG_EXT4_ERROR_REPORT out: ext4_netlink_send_info(sb, force_ro ? 2 : 1); +#endif } static void flush_stashed_error_work(struct work_struct *work) @@ -6855,7 +6868,9 @@ wait_queue_head_t ext4__ioend_wq[EXT4_WQ_HASH_SZ]; static int __init ext4_init_fs(void) { int i, err; +#ifdef CONFIG_EXT4_ERROR_REPORT struct netlink_kernel_cfg cfg = {.groups = NL_EXT4_ERROR_GROUP,}; +#endif ratelimit_state_init(&ext4_mount_msg_ratelimit, 30 * HZ, 64); ext4_li_info = NULL; @@ -6908,9 +6923,11 @@ static int __init ext4_init_fs(void) if (err) goto out; +#ifdef CONFIG_EXT4_ERROR_REPORT ext4nl = netlink_kernel_create(&init_net, NETLINK_FILESYSTEM, &cfg); if (!ext4nl) printk(KERN_ERR "EXT4-fs: Cannot create netlink socket.\n"); +#endif return 0; out: unregister_as_ext2(); @@ -6951,7 +6968,9 @@ static void __exit ext4_exit_fs(void) ext4_exit_post_read_processing(); ext4_exit_es(); ext4_exit_pending(); +#ifdef CONFIG_EXT4_ERROR_REPORT netlink_kernel_release(ext4nl); +#endif } MODULE_AUTHOR("Remy Card, Stephen Tweedie, Andrew Morton, Andreas Dilger, Theodore Ts'o and others"); -- 2.31.1

2 1

[PATCH 1/2] mmap: export __do_mmap_mm symbol
by Wupeng Ma 19 Sep '23

19 Sep '23

From: Chunsheng Luo <luochunsheng(a)huawei.com> euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7WLVX --------------------------------------------- export __do_mmap_mm symbol by referring to the earlier version Signed-off-by: Chunsheng Luo <luochunsheng(a)huawei.com> --- mm/mmap.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/mmap.c b/mm/mmap.c index 2aef07b8a85e..819c37e7eb1f 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1469,6 +1469,7 @@ unsigned long __do_mmap_mm(struct mm_struct *mm, struct file *file, unsigned lon *populate = len; return addr; } +EXPORT_SYMBOL(__do_mmap_mm); unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, -- 2.33.0

1 1

[PATCH OLK-5.10] sched/qos: Fix warning in CPU hotplug scenarios
by Xia Fukun 19 Sep '23

19 Sep '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZMCB CVE: NA -------------------------------- CPU hotplug callbacks race against distribute_cfs_runtime(), when the QOS_SCHED feature is enabled, there may be situations where the cfs_rq-> runtime_remaining == 1 and cfs_rq is QOS_THROTTLED. Turn off the Qos_throttle when the CPU is offline. No longer allocate time to cfs_rq in this scenario to fix the warning. Fixes: 4eb6eb7941dc ("sched/qos: Don't unthrottle cfs_rq when cfs_rq is throttled by qos") Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- kernel/sched/fair.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index daa853b19853..b8bf7acb9f9a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5267,6 +5267,19 @@ static void distribute_cfs_runtime(struct cfs_bandwidth *cfs_b) if (!cfs_rq_throttled(cfs_rq)) goto next; + /* + * CPU hotplug callbacks race against distribute_cfs_runtime() + * when the QOS_SCHED feature is enabled, there may be + * situations where the runtime_remaining > 0. + * Qos_sched does not care whether the cfs_rq has time left, + * so no longer allocate time to cfs_rq in this scenario. + */ +#ifdef CONFIG_QOS_SCHED + if (cfs_rq->throttled == QOS_THROTTLED && + cfs_rq->runtime_remaining > 0) + goto next; +#endif + /* By the above check, this should never be true */ SCHED_WARN_ON(cfs_rq->runtime_remaining > 0); @@ -7923,6 +7936,10 @@ static __always_inline bool check_qos_cfs_rq(struct cfs_rq *cfs_rq) if (unlikely(cfs_rq && is_offline_level(cfs_rq->tg->qos_level) && !sched_idle_cpu(smp_processor_id()) && cfs_rq->h_nr_running == cfs_rq->idle_h_nr_running)) { + + if (!rq_of(cfs_rq)->online) + return false; + throttle_qos_cfs_rq(cfs_rq); return true; } -- 2.34.1

2 1

[PATCH openEuler-23.09] mm/mlock: return EINVAL for illegal user memory range in mlock
by Wupeng Ma 19 Sep '23

19 Sep '23

From: Ma Wupeng <mawupeng1(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7TW89 -------------------------------- While testing mlock, we have a problem if the len of mlock is ULONG_MAX. The return value of mlock is zero. But nothing will be locked since the len in do_mlock overflows to zero due to the following code in mlock: len = PAGE_ALIGN(len + (offset_in_page(start))); The same problem happens in munlock. Since TASK_SIZE is the maximum user space address. The start or len of mlock shouldn't be bigger than this. Function access_ok can be used to check this issue, so return -EINVAL if bigger. Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- mm/mlock.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 40b43f8740df..e90139d42f88 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -479,8 +479,6 @@ static int apply_vma_lock_flags(unsigned long start, size_t len, end = start + len; if (end < start) return -EINVAL; - if (end == start) - return 0; vma = vma_iter_load(&vmi); if (!vma) return -ENOMEM; @@ -574,9 +572,15 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla if (!can_do_mlock()) return -EPERM; + if (!len) + return 0; + len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK; + if (!len) + return -EINVAL; + lock_limit = rlimit(RLIMIT_MEMLOCK); lock_limit >>= PAGE_SHIFT; locked = len >> PAGE_SHIFT; @@ -634,8 +638,13 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) start = untagged_addr(start); + if (!len) + return 0; + len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK; + if (!len) + return -EINVAL; if (mmap_write_lock_killable(current->mm)) return -EINTR; -- 2.25.1

2 1

[PATCH OLK-5.10] uacce: modify the configuration mode of device isolation stragety
by Wenkai Lin 19 Sep '23

19 Sep '23

From: Qi Tao <taoqi10(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I82ARL CVE: NA ------------------------------------------------------------------ cancel the reference counting of the accelerator device so that the device (with hardware errors) isolation stragety can also be modified while the accelerator is running a task. Signed-off-by: Qi Tao <taoqi10(a)huawei.com> Signed-off-by: JiangShui Yang <yangjiangshui(a)h-partners.com> --- drivers/misc/uacce/uacce.c | 5 ----- include/linux/uacce.h | 2 -- 2 files changed, 7 deletions(-) diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c index 21a4f5892aff..7a64eed01df7 100644 --- a/drivers/misc/uacce/uacce.c +++ b/drivers/misc/uacce/uacce.c @@ -88,7 +88,6 @@ static int uacce_put_queue(struct uacce_queue *q) uacce->ops->put_queue(q); q->state = UACCE_Q_ZOMBIE; - atomic_dec(&uacce->ref); return 0; } @@ -348,7 +347,6 @@ static int uacce_fops_open(struct inode *inode, struct file *filep) goto out_with_bond; } - atomic_inc(&uacce->ref); init_waitqueue_head(&q->wait); filep->private_data = q; q->state = UACCE_Q_INIT; @@ -821,9 +819,6 @@ static ssize_t isolate_strategy_store(struct device *dev, struct device_attribut if (val > UACCE_MAX_ERR_THRESHOLD) return -EINVAL; - if (atomic_read(&uacce->ref)) - return -EBUSY; - ret = uacce->ops->isolate_err_threshold_write(uacce, val); if (ret) return ret; diff --git a/include/linux/uacce.h b/include/linux/uacce.h index 8187c1bda236..9b0c04a9cff7 100644 --- a/include/linux/uacce.h +++ b/include/linux/uacce.h @@ -148,7 +148,6 @@ struct uacce_queue { * @mutex: protects uacce operation * @priv: private pointer of the uacce * @queues: list of queues - * @ref: reference of the uacce */ struct uacce_device { const char *algs; @@ -164,7 +163,6 @@ struct uacce_device { struct device dev; struct mutex mutex; void *priv; - atomic_t ref; struct uacce_err_isolate *isolate; struct list_head queues; }; -- 2.30.0

2 1

[PATCH OLK-5.10 v2] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
by Zhihao Cheng 19 Sep '23

19 Sep '23

maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I828EV CVE: NA Reference: https://lore.kernel.org/linux-ext4/20230911155822.kbg2xayc7ys24kay@quack3/T… -------------------------------- JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), however, other process could intercept the EIO information from bdev's mapping, which leads journal recovering successful even EIO occurs during data written back to fs device. We found this problem in our product, iscsi + multipath is chosen for block device of ext4. Unstable network may trigger kpartx to rescan partitions in device mapper layer. Detailed process is shown as following: mount kpartx irq jbd2_journal_recover do_one_pass memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal mark_buffer_dirty // mark bh dirty vfs_read generic_file_read_iter // dio filemap_write_and_wait_range __filemap_fdatawrite_range do_writepages block_write_full_folio submit_bh_wbc >> EIO occurs in disk << end_buffer_async_write mark_buffer_write_io_error mapping_set_error set_bit(AS_EIO, &mapping->flags) // set! filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // clear! err2 = sync_blockdev filemap_write_and_wait filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // false err2 = 0 Filesystem is mounted successfully even data from journal is failed written into disk, and ext4/ocfs2 could become corrupted. Fix it by comparing the wb_err state in fs block device before recovering and after recovering. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888 Cc: stable(a)vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> --- v1->v3: Initialize wb_err. Untialized wb_err could be same with mapping->wb_err(eg. EIO without ERRSEQ_SEEN). When EIO occurs again, mapping->wb_err won't be changed, and wb_err is still same with mapping->wb_err. fs/jbd2/recovery.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index 9e4537349380..cdf4c553f058 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -292,6 +292,8 @@ int jbd2_journal_recover(journal_t *journal) journal_superblock_t * sb; struct recovery_info info; + errseq_t wb_err; + struct address_space *mapping; memset(&info, 0, sizeof(info)); sb = journal->j_superblock; @@ -309,6 +311,9 @@ int jbd2_journal_recover(journal_t *journal) return 0; } + wb_err = 0; + mapping = journal->j_fs_dev->bd_inode->i_mapping; + errseq_check_and_advance(&mapping->wb_err, &wb_err); err = do_one_pass(journal, &info, PASS_SCAN); if (!err) err = do_one_pass(journal, &info, PASS_REVOKE); @@ -327,6 +332,9 @@ int jbd2_journal_recover(journal_t *journal) jbd2_journal_clear_revoke(journal); err2 = sync_blockdev(journal->j_fs_dev); + if (!err) + err = err2; + err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err); if (!err) err = err2; /* Make sure all replayed data is on permanent storage */ -- 2.31.1

2 1

[PATCH openEuler-23.09] config: Disable x86 IBT for kpatch
by Wei Li 19 Sep '23

19 Sep '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I82GQG -------------------------------- The CONFIG_X86_KERNEL_IBT option causes the compiled symbols to lack the __prefix__ symbol, leading kpatch to incorrectly determine that the symbol has no padding. In reality, the symbol does have padding, resulting in a compilation failure for kpatch. Let's disable IBT for now. Signed-off-by: Wei Li <liwei391(a)huawei.com> --- arch/x86/configs/openeuler_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 6bbaaaa82a0b..d3c0da3ddd64 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -457,7 +457,7 @@ CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1 CONFIG_X86_PAT=y CONFIG_ARCH_USES_PG_UNCACHED=y CONFIG_X86_UMIP=y -CONFIG_X86_KERNEL_IBT=y +# CONFIG_X86_KERNEL_IBT is not set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y CONFIG_X86_INTEL_TSX_MODE_OFF=y # CONFIG_X86_INTEL_TSX_MODE_ON is not set -- 2.25.1

2 1

[PATCH openEuler-1.0-LTS] crypto:hisilicon/qm - cache write back before flr and poweroff
by w00416078 19 Sep '23

19 Sep '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4YKIJ CVE: NA -------------------------------- In order to prevent an error when the hardware writes back cache, the driver should write back the hardware cache before flr and poweroff. If vfs is enabled, we should process abnormal scenes of the whole vfs. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> Reviewed-by: Weili Qian <qianweili(a)huawei.com> Reviewed-by: Ling Mingqiang <lingmingqiang(a)huawei.com> Acked-by: Xie XiuQi <xiexiuqi(a)huawei.com> --- drivers/crypto/hisilicon/qm.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index 8fa855fd2387..e0fd83465dce 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -2772,6 +2772,8 @@ void hisi_qm_dev_shutdown(struct pci_dev *pdev) ret = hisi_qm_stop(qm, QM_NORMAL); if (ret) dev_err(&pdev->dev, "Fail to stop qm in shutdown!\n"); + + hisi_qm_cache_wb(qm); } EXPORT_SYMBOL_GPL(hisi_qm_dev_shutdown); @@ -3718,12 +3720,16 @@ static int qm_vf_reset_prepare(struct pci_dev *pdev, pci_save_state(dev); ret = hisi_qm_stop(qm, stop_reason); - if (ret) - goto prepare_fail; + if (ret) { + hisi_qm_set_hw_reset(qm, + QM_RESET_STOP_TX_OFFSET); + hisi_qm_set_hw_reset(qm, + QM_RESET_STOP_RX_OFFSET); + atomic_set(&qm->status.flags, QM_STOP); + } } } -prepare_fail: mutex_unlock(&qm_list->lock); return ret; } @@ -4117,19 +4123,26 @@ void hisi_qm_reset_prepare(struct pci_dev *pdev) if (qm->vfs_num) { ret = qm_vf_reset_prepare(pdev, qm->qm_list, QM_FLR); - if (ret) { - pci_err(pdev, "Fails to prepare reset!\n"); - return; - } + if (ret) + pci_err(pdev, "Failed to stop vfs!\n"); } ret = hisi_qm_stop(qm, QM_FLR); if (ret) { - pci_err(pdev, "Fails to stop QM!\n"); - return; + pci_err(pdev, "Failed to stop QM!\n"); + goto err_prepare; } + hisi_qm_cache_wb(qm); pci_info(pdev, "FLR resetting...\n"); + return; + +err_prepare: + pci_info(pdev, "FLR resetting prepare failed!\n"); + hisi_qm_set_hw_reset(qm, QM_RESET_STOP_TX_OFFSET); + hisi_qm_set_hw_reset(qm, QM_RESET_STOP_RX_OFFSET); + atomic_set(&qm->status.flags, QM_STOP); + hisi_qm_cache_wb(qm); } EXPORT_SYMBOL_GPL(hisi_qm_reset_prepare); -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS v2 00/11] Fix booting failure on arm64
by Wei Li 18 Sep '23

18 Sep '23

The arm64 test machine boot failed when using hulk_defconfig after d63c76835476 ("arm64: efi: Execute runtime services from a dedicated stack"), revert this patch set first for weekly release. Wei Li (11): Revert "arm64: efi: Make efi_rt_lock a raw_spinlock" Revert "efi: rt-wrapper: Add missing include" Revert "arm64: efi: Recover from synchronous exceptions occurring in firmware" Revert "arm64: efi: Execute runtime services from a dedicated stack" Revert "efi: fix userspace infinite retry read efivars after EFI runtime services page fault" Revert "arm64: efi: Restore register x18 if it was corrupted" Revert "x86/efi: fix a -Wtype-limits compilation warning" Revert "efi: Fix build error due to enum collision between efi.h and ima.h" Revert "efi: Fix debugobjects warning on 'efi_rts_work'" Revert "efi/x86: Handle page faults occurring while running EFI runtime services" Revert "efi: Make efi_rts_work accessible to efi page fault handler" arch/arm64/include/asm/efi.h | 12 --- arch/arm64/kernel/efi-rt-wrapper.S | 49 +---------- arch/arm64/kernel/efi.c | 50 ----------- arch/arm64/mm/fault.c | 4 - arch/x86/include/asm/efi.h | 1 - arch/x86/mm/fault.c | 9 -- arch/x86/platform/efi/quirks.c | 78 ----------------- drivers/firmware/efi/runtime-wrappers.c | 109 +++++++++++++++--------- include/linux/efi.h | 42 --------- 9 files changed, 70 insertions(+), 284 deletions(-) -- 2.25.1

2 12

[PATCH openEuler-1.0-LTS] crypto:hisilicon/sec - modify hw endian config
by w00416078 18 Sep '23

18 Sep '23

From: Yu'an Wang <wangyuan46(a)huawei.com> driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4YKLC CVE: NA -------------------------------- When the endian configuration of the hardware is abnormal, it will affect the normal function. Currently the soft configuration method can't restore the faulty device. The endian needs to be configured according to the system properties. So fix it. Signed-off-by: Yu'an Wang <wangyuan46(a)huawei.com> Reviewed-by: Kai Ye <yekai13(a)huawei.com> Reviewed-by: Ling Mingqiang <lingmingqiang(a)huawei.com> Acked-by: Xie XiuQi <xiexiuqi(a)huawei.com> --- drivers/crypto/hisilicon/sec2/sec_main.c | 36 +++++++----------------- 1 file changed, 10 insertions(+), 26 deletions(-) diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c index 58f726ba022f..a568d5363c1e 100644 --- a/drivers/crypto/hisilicon/sec2/sec_main.c +++ b/drivers/crypto/hisilicon/sec2/sec_main.c @@ -255,33 +255,19 @@ static const struct pci_device_id sec_dev_ids[] = { }; MODULE_DEVICE_TABLE(pci, sec_dev_ids); -static u8 sec_get_endian(struct hisi_qm *qm) +static void sec_set_endian(struct hisi_qm *qm) { u32 reg; - /* - * As for VF, it is a wrong way to get endian setting by - * reading a register of the engine - */ - if (qm->pdev->is_virtfn) { - dev_err_ratelimited(&qm->pdev->dev, - "cannot access a register in VF!\n"); - return SEC_LE; - } - reg = readl_relaxed(qm->io_base + SEC_ENGINE_PF_CFG_OFF + - SEC_ACC_COMMON_REG_OFF + SEC_CONTROL_REG); - - /* BD little endian mode */ - if (!(reg & BIT(0))) - return SEC_LE; + reg = readl_relaxed(SEC_ADDR(qm, SEC_CONTROL_REG)); + reg &= ~(BIT(1) | BIT(0)); + if (IS_ENABLED(CONFIG_64BIT)) + reg |= BIT(1); - /* BD 32-bits big endian mode */ - else if (!(reg & BIT(1))) - return SEC_32BE; + if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) + reg |= BIT(0); - /* BD 64-bits big endian mode */ - else - return SEC_64BE; + writel_relaxed(reg, SEC_ADDR(qm, SEC_CONTROL_REG)); } static int sec_engine_init(struct hisi_qm *qm) @@ -331,9 +317,7 @@ static int sec_engine_init(struct hisi_qm *qm) SEC_ADDR(qm, SEC_BD_ERR_CHK_EN_REG3)); /* config endian */ - reg = readl_relaxed(SEC_ADDR(qm, SEC_CONTROL_REG)); - reg |= sec_get_endian(qm); - writel_relaxed(reg, SEC_ADDR(qm, SEC_CONTROL_REG)); + sec_set_endian(qm); return 0; } @@ -813,7 +797,7 @@ static int sec_qm_pre_init(struct hisi_qm *qm, struct pci_dev *pdev) { int ret; - qm->algs = "sec\ncipher\ndigest\naead\n"; + qm->algs = "cipher\ndigest\naead\n"; qm->uacce_mode = uacce_mode; qm->pdev = pdev; ret = hisi_qm_pre_init(qm, pf_q_num, SEC_PF_DEF_Q_BASE); -- 2.34.1

2 1

[PATCH OLK-5.10 0/2] Not clear ATA_PFLAG_EH_PENDING and not thaw the port twice in ata_eh_reset()
by Xingui Yang 18 Sep '23

18 Sep '23

Clear port pending interrupts before reset, as per AHCI specifications (Szuying). Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() to allow EH to continue on with other actions recorded with error interrupts triggered before EH completes. A~Nd an additional fix to avoid thawing a port twice in EH (Niklas). Niklas Cassel (2): ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() ata: libata-eh: do not thaw the port twice in ata_eh_reset() drivers/ata/libata-eh.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) -- 2.17.1

2 3

[PATCH OLK-5.10 0/2] Not clear ATA_PFLAG_EH_PENDING and not thaw the port twice in ata_eh_reset()
by Xingui Yang 18 Sep '23

18 Sep '23

Clear port pending interrupts before reset, as per AHCI specifications (Szuying). Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() to allow EH to continue on with other actions recorded with error interrupts triggered before EH completes. A~Nd an additional fix to avoid thawing a port twice in EH (Niklas). Niklas Cassel (2): ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() ata: libata-eh: do not thaw the port twice in ata_eh_reset() drivers/ata/libata-eh.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) -- 2.17.1

2 3

[PATCH openEuler-1.0-LTS v1 0/3] Fix booting failure on arm64
by Wei Li 18 Sep '23

18 Sep '23

The arm64 test machine boot failed when using hulk_defconfig after d63c76835476 ("arm64: efi: Execute runtime services from a dedicated stack"), revert it first for weekly release. Wei Li (3): Revert "arm64: efi: Make efi_rt_lock a raw_spinlock" Revert "arm64: efi: Recover from synchronous exceptions occurring in firmware" Revert "arm64: efi: Execute runtime services from a dedicated stack" arch/arm64/include/asm/efi.h | 12 ------ arch/arm64/kernel/efi-rt-wrapper.S | 39 ++----------------- arch/arm64/kernel/efi.c | 50 ------------------------- arch/arm64/mm/fault.c | 4 -- drivers/firmware/efi/runtime-wrappers.c | 1 - 5 files changed, 3 insertions(+), 103 deletions(-) -- 2.25.1

2 4

[PATCH OLK-5.10] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
by Zhihao Cheng 18 Sep '23

18 Sep '23

maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I828EV CVE: NA Reference: https://lore.kernel.org/linux-ext4/20230911155822.kbg2xayc7ys24kay@quack3/T… -------------------------------- JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), however, other process could intercept the EIO information from bdev's mapping, which leads journal recovering successful even EIO occurs during data written back to fs device. We found this problem in our product, iscsi + multipath is chosen for block device of ext4. Unstable network may trigger kpartx to rescan partitions in device mapper layer. Detailed process is shown as following: mount kpartx irq jbd2_journal_recover do_one_pass memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal mark_buffer_dirty // mark bh dirty vfs_read generic_file_read_iter // dio filemap_write_and_wait_range __filemap_fdatawrite_range do_writepages block_write_full_folio submit_bh_wbc >> EIO occurs in disk << end_buffer_async_write mark_buffer_write_io_error mapping_set_error set_bit(AS_EIO, &mapping->flags) // set! filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // clear! err2 = sync_blockdev filemap_write_and_wait filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // false err2 = 0 Filesystem is mounted successfully even data from journal is failed written into disk, and ext4/ocfs2 could become corrupted. Fix it by comparing the wb_err state in fs block device before recovering and after recovering. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888 Cc: stable(a)vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> --- fs/jbd2/recovery.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index 9e4537349380..361c37ef1f8a 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -292,6 +292,8 @@ int jbd2_journal_recover(journal_t *journal) journal_superblock_t * sb; struct recovery_info info; + errseq_t wb_err; + struct address_space *mapping; memset(&info, 0, sizeof(info)); sb = journal->j_superblock; @@ -309,6 +311,8 @@ int jbd2_journal_recover(journal_t *journal) return 0; } + mapping = journal->j_fs_dev->bd_inode->i_mapping; + errseq_check_and_advance(&mapping->wb_err, &wb_err); err = do_one_pass(journal, &info, PASS_SCAN); if (!err) err = do_one_pass(journal, &info, PASS_REVOKE); @@ -327,6 +331,9 @@ int jbd2_journal_recover(journal_t *journal) jbd2_journal_clear_revoke(journal); err2 = sync_blockdev(journal->j_fs_dev); + if (!err) + err = err2; + err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err); if (!err) err = err2; /* Make sure all replayed data is on permanent storage */ -- 2.31.1

2 1

[PATCH v2 openEuler-23.09] ima: fix parser strategy unable to manually import kernel
by Zhou Shuiqing 18 Sep '23

18 Sep '23

euleros inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I822F4 CVE: NA ------------------------------------------------- This patch is to fix parser strategy unable to manually import kernel v2: -fix code indentation Signed-off-by: Zhou Shuiqing <zhoushuiqing2(a)huawei.com> Reviewed-by: Huaxin Lu <luhuaxin1(a)huawei.com> --- security/integrity/ima/ima_policy.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index 81a727a3f..80bf1dc80 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -1376,7 +1376,7 @@ static bool ima_validate_rule(struct ima_rule_entry *entry) entry->flags & (IMA_DIGSIG_REQUIRED | IMA_MODSIG_ALLOWED | #ifdef CONFIG_IMA_DIGEST_LIST IMA_CHECK_BLACKLIST | IMA_VALIDATE_ALGOS | - IMA_META_IMMUTABLE_REQUIRED | IMA_PARSER)) + IMA_META_IMMUTABLE_REQUIRED)) #else IMA_CHECK_BLACKLIST | IMA_VALIDATE_ALGOS)) #endif @@ -1416,7 +1416,8 @@ static bool ima_validate_rule(struct ima_rule_entry *entry) IMA_FGROUP | IMA_DIGSIG_REQUIRED | IMA_PERMIT_DIRECTIO | IMA_VALIDATE_ALGOS | #ifdef CONFIG_IMA_DIGEST_LIST - IMA_VERITY_REQUIRED | IMA_META_IMMUTABLE_REQUIRED)) + IMA_VERITY_REQUIRED | + IMA_META_IMMUTABLE_REQUIRED | IMA_PARSER)) #else IMA_VERITY_REQUIRED)) #endif -- 2.33.0

2 1

[PATCH openEuler-23.09] ima: modify the CONFIG configuration of x86_64
by Zhou Shuiqing 18 Sep '23

18 Sep '23

euleros inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8264X CVE: NA ------------------------------------------------- This patch is to modify the CONFIG configuration of x86_64. Signed-off-by: Zhou Shuiqing <zhoushuiqing2(a)huawei.com> --- arch/x86/configs/openeuler_defconfig | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 5f4d70de3..5bdd03c9f 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -4368,18 +4368,18 @@ CONFIG_TCG_TIS_SPI=y # CONFIG_TCG_TIS_SPI_CR50 is not set CONFIG_TCG_TIS_I2C=m CONFIG_TCG_TIS_I2C_CR50=m -CONFIG_TCG_TIS_I2C_ATMEL=y -CONFIG_TCG_TIS_I2C_INFINEON=y -CONFIG_TCG_TIS_I2C_NUVOTON=y -CONFIG_TCG_NSC=y -CONFIG_TCG_ATMEL=y -CONFIG_TCG_INFINEON=y +CONFIG_TCG_TIS_I2C_ATMEL=m +CONFIG_TCG_TIS_I2C_INFINEON=m +CONFIG_TCG_TIS_I2C_NUVOTON=m +CONFIG_TCG_NSC=m +CONFIG_TCG_ATMEL=m +CONFIG_TCG_INFINEON=m CONFIG_TCG_XEN=m CONFIG_TCG_CRB=y CONFIG_TCG_VTPM_PROXY=m -CONFIG_TCG_TIS_ST33ZP24=y -CONFIG_TCG_TIS_ST33ZP24_I2C=y -CONFIG_TCG_TIS_ST33ZP24_SPI=y +CONFIG_TCG_TIS_ST33ZP24=m +CONFIG_TCG_TIS_ST33ZP24_I2C=m +CONFIG_TCG_TIS_ST33ZP24_SPI=m CONFIG_TELCLOCK=m CONFIG_XILLYBUS_CLASS=m CONFIG_XILLYBUS=m -- 2.33.0

2 1

[PATCH openEuler-23.09] ima: modify the CONFIG configuration of x86_64
by Zhou Shuiqing 18 Sep '23

18 Sep '23

euleros inclusion category: bugfix bugzilla:https://gitee.com/openeuler/kernel/issues/I8264X CVE: NA ------------------------------------------------- This patch is to modify the CONFIG configuration of x86_64. Signed-off-by: Zhou Shuiqing <zhoushuiqing2(a)huawei.com> --- arch/x86/configs/openeuler_defconfig | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 5f4d70de3..5bdd03c9f 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -4368,18 +4368,18 @@ CONFIG_TCG_TIS_SPI=y # CONFIG_TCG_TIS_SPI_CR50 is not set CONFIG_TCG_TIS_I2C=m CONFIG_TCG_TIS_I2C_CR50=m -CONFIG_TCG_TIS_I2C_ATMEL=y -CONFIG_TCG_TIS_I2C_INFINEON=y -CONFIG_TCG_TIS_I2C_NUVOTON=y -CONFIG_TCG_NSC=y -CONFIG_TCG_ATMEL=y -CONFIG_TCG_INFINEON=y +CONFIG_TCG_TIS_I2C_ATMEL=m +CONFIG_TCG_TIS_I2C_INFINEON=m +CONFIG_TCG_TIS_I2C_NUVOTON=m +CONFIG_TCG_NSC=m +CONFIG_TCG_ATMEL=m +CONFIG_TCG_INFINEON=m CONFIG_TCG_XEN=m CONFIG_TCG_CRB=y CONFIG_TCG_VTPM_PROXY=m -CONFIG_TCG_TIS_ST33ZP24=y -CONFIG_TCG_TIS_ST33ZP24_I2C=y -CONFIG_TCG_TIS_ST33ZP24_SPI=y +CONFIG_TCG_TIS_ST33ZP24=m +CONFIG_TCG_TIS_ST33ZP24_I2C=m +CONFIG_TCG_TIS_ST33ZP24_SPI=m CONFIG_TELCLOCK=m CONFIG_XILLYBUS_CLASS=m CONFIG_XILLYBUS=m -- 2.33.0

2 1

[PATCH openEuler-23.09] livepatch: Enable livepatch configs in openeuler_defconfig
by Zheng Yejian 18 Sep '23

18 Sep '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I826DM CVE: NA -------------------------------- Enable the same livepatch configures for x86_64 and arm64 as that in openEuler-22.03-LTS. Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 12 ++++++++++++ arch/x86/configs/openeuler_defconfig | 9 +++++++++ 2 files changed, 21 insertions(+) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 6c6cd4701f92..7e95287b4716 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -332,6 +332,18 @@ CONFIG_ARCH_XGENE=y # CONFIG_ARCH_ZYNQMP is not set # end of Platform selection +CONFIG_HAVE_LIVEPATCH_WO_FTRACE=y + +# +# Enable Livepatch +# +CONFIG_LIVEPATCH=y +CONFIG_LIVEPATCH_WO_FTRACE=y +CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY=y +# CONFIG_LIVEPATCH_STACK is not set +CONFIG_LIVEPATCH_RESTRICT_KPROBE=y +# end of Enable Livepatch + # # Kernel Features # diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 5f4d70de32f9..0f30eb56c31b 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -502,8 +502,17 @@ CONFIG_LEGACY_VSYSCALL_XONLY=y CONFIG_MODIFY_LDT_SYSCALL=y # CONFIG_STRICT_SIGALTSTACK_SIZE is not set CONFIG_HAVE_LIVEPATCH_WO_FTRACE=y + +# +# Enable Livepatch +# CONFIG_LIVEPATCH=y +# CONFIG_LIVEPATCH_FTRACE is not set CONFIG_LIVEPATCH_WO_FTRACE=y +CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY=y +# CONFIG_LIVEPATCH_STACK is not set +CONFIG_LIVEPATCH_RESTRICT_KPROBE=y +# end of Enable Livepatch # end of Processor type and features CONFIG_FUNCTION_PADDING_CFI=11 -- 2.25.1

2 1

[PATCH OLK-5.10] zram: correctly handle all next_arg() cases
by Jinjiang Tu 18 Sep '23

18 Sep '23

From: Sergey Senozhatsky <senozhatsky(a)chromium.org> mainline inclusion from mainline-v6.3-rc1 commit df32de1433412621b92daf1b3369ac053214031e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I822Z8 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… ------------------------------------------- When supplied buffer does not have assignment sign next_arg() sets `val` pointer to NULL, so we cannot dereference it. Add a NULL pointer test to handle `param` case, in addition to `*val` test, which handles cases when param has no value assigned to it: `param=`. Link: https://lkml.kernel.org/r/20230103030119.1496358-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Cc: Minchan Kim <minchan(a)kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com> --- drivers/block/zram/zram_drv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index e332b4d55359..955f0c4d358f 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1123,7 +1123,7 @@ static ssize_t recomp_algorithm_store(struct device *dev, while (*args) { args = next_arg(args, &param, &val); - if (!*val) + if (!val || !*val) return -EINVAL; if (!strcmp(param, "algo")) { @@ -1800,7 +1800,7 @@ static ssize_t recompress_store(struct device *dev, while (*args) { args = next_arg(args, &param, &val); - if (!*val) + if (!val || !*val) return -EINVAL; if (!strcmp(param, "type")) { -- 2.25.1

2 1

[PATCH openEuler-22.03-LTS] nvme-pci: fix mempool alloc size
by Yong Hu 18 Sep '23

18 Sep '23

From: Keith Busch <kbusch(a)kernel.org> stable inclusion from stable-v5.10.163 commit dfb6d54893d544151e7f480bc44cfe7823f5ad23 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7PZZC Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… -------------------------------- [ Upstream commit c89a529e823d51dd23c7ec0c047c7a454a428541 ] Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries. The result is used to determine how many PRP Lists are required. The code was previously rounding this to 1 list, but we can require 2 in the worst case. In that scenario, the driver would corrupt memory beyond the size provided by the mempool. While unlikely to occur (you'd need a 4MB in exactly 127 phys segments on a queue that doesn't support SGLs), this memory corruption has been observed by kfence. Cc: Jens Axboe <axboe(a)kernel.dk> Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations") Signed-off-by: Keith Busch <kbusch(a)kernel.org> Reviewed-by: Jens Axboe <axboe(a)kernel.dk> Reviewed-by: Kanchan Joshi <joshi.k(a)samsung.com> Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com> Signed-off-by: Christoph Hellwig <hch(a)lst.de> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yong Hu <yong.hu(a)windriver.com> --- drivers/nvme/host/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index bbf6ce4b82ac..e805a9813628 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -371,8 +371,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, */ static int nvme_pci_npages_prp(void) { - unsigned nprps = DIV_ROUND_UP(NVME_MAX_KB_SZ + NVME_CTRL_PAGE_SIZE, - NVME_CTRL_PAGE_SIZE); + unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; + unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); } -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS-SP2] nvme-pci: fix timeout request state check
by Yong Hu 18 Sep '23

18 Sep '23

From: Keith Busch <kbusch(a)kernel.org> stable inclusion from stable-v5.10.166 commit 5f10f7efe0fc97c0ee2112a1032914f6fb2f940c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7R4BC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 1c5842085851f786eba24a39ecd02650ad892064 ] Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPLETED, so don't check for that. We only care if the state isn't IN_FLIGHT. This is fixing an issue where the driver aborts an IO that we just completed. Seeing the "aborting" message instead of "polled" is very misleading as to where the timeout problem resides. Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq") Signed-off-by: Keith Busch <kbusch(a)kernel.org> Signed-off-by: Christoph Hellwig <hch(a)lst.de> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yong Hu <yong.hu(a)windriver.com> --- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index ac5745d8dd2b..f1ac50c7a1d6 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1292,7 +1292,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) else nvme_poll_irqdisable(nvmeq); - if (blk_mq_request_completed(req)) { + if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT) { dev_warn(dev->ctrl.device, "I/O %d QID %d timeout, completion polled\n", req->tag, nvmeq->qid); -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS] nvme-pci: fix timeout request state check
by Yong Hu 18 Sep '23

18 Sep '23

From: Keith Busch <kbusch(a)kernel.org> stable inclusion from stable-v5.10.166 commit 5f10f7efe0fc97c0ee2112a1032914f6fb2f940c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7R4BC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 1c5842085851f786eba24a39ecd02650ad892064 ] Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPLETED, so don't check for that. We only care if the state isn't IN_FLIGHT. This is fixing an issue where the driver aborts an IO that we just completed. Seeing the "aborting" message instead of "polled" is very misleading as to where the timeout problem resides. Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq") Signed-off-by: Keith Busch <kbusch(a)kernel.org> Signed-off-by: Christoph Hellwig <hch(a)lst.de> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yong Hu <yong.hu(a)windriver.com> --- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index fbbbfdea076a..bbf6ce4b82ac 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1291,7 +1291,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) else nvme_poll_irqdisable(nvmeq); - if (blk_mq_request_completed(req)) { + if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT) { dev_warn(dev->ctrl.device, "I/O %d QID %d timeout, completion polled\n", req->tag, nvmeq->qid); -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS-SP1] netfilter: nftables: exthdr: fix 4-byte stack OOB write
by Guo Mengqi 18 Sep '23

18 Sep '23

From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.6-rc1 commit fd94d9dadee58e09b49075240fe83423eb1dcd36 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I80I0G CVE: CVE-2023-4881 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- If priv->len is a multiple of 4, then dst[len / 4] can write past the destination array which leads to stack corruption. This construct is necessary to clean the remainder of the register in case ->len is NOT a multiple of the register size, so make it conditional just like nft_payload.c does. The bug was added in 4.1 cycle and then copied/inherited when tcp/sctp and ip option support was added. Bug reported by Zero Day Initiative project (ZDI-CAN-21950, ZDI-CAN-21951, ZDI-CAN-21961). Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing") Fixes: 935b7f643018 ("netfilter: nft_exthdr: add TCP option matching") Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks") Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options") Signed-off-by: Florian Westphal <fw(a)strlen.de> Conflicts: net/netfilter/nft_exthdr.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nft_exthdr.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c index 670dd146fb2b..ca268293cfa1 100644 --- a/net/netfilter/nft_exthdr.c +++ b/net/netfilter/nft_exthdr.c @@ -33,6 +33,14 @@ static unsigned int optlen(const u8 *opt, unsigned int offset) return opt[offset + 1]; } +static int nft_skb_copy_to_reg(const struct sk_buff *skb, int offset, u32 *dest, unsigned int len) +{ + if (len % NFT_REG32_SIZE) + dest[len / NFT_REG32_SIZE] = 0; + + return skb_copy_bits(skb, offset, dest, len); +} + static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) @@ -54,8 +62,7 @@ static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, } offset += priv->offset; - dest[priv->len / NFT_REG32_SIZE] = 0; - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) goto err; return; err: @@ -151,8 +158,7 @@ static void nft_exthdr_ipv4_eval(const struct nft_expr *expr, } offset += priv->offset; - dest[priv->len / NFT_REG32_SIZE] = 0; - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) goto err; return; err: @@ -208,7 +214,8 @@ static void nft_exthdr_tcp_eval(const struct nft_expr *expr, if (priv->flags & NFT_EXTHDR_F_PRESENT) { *dest = 1; } else { - dest[priv->len / NFT_REG32_SIZE] = 0; + if (priv->len % NFT_REG32_SIZE) + dest[priv->len / NFT_REG32_SIZE] = 0; memcpy(dest, opt + offset, priv->len); } -- 2.17.1

2 1

[PATCH openEuler-23.09] ima: fix parser strategy unable to manually import kernel
by Zhou Shuiqing 18 Sep '23

18 Sep '23

From: zhoushuiqing <zhoushuiqing2(a)huawei.com> euleros inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I822F4 CVE: NA ------------------------------------------------- This patch is to fix parser strategy unable to manually import kernel Signed-off-by: Zhou Shuiqing <zhoushuiqing2(a)huawei.com> Reviewed-by: Huaxin Lu <luhuaxin1(a)huawei.com> --- security/integrity/ima/ima_policy.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index 81a727a3f..ca87605e5 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -1376,7 +1376,7 @@ static bool ima_validate_rule(struct ima_rule_entry *entry) entry->flags & (IMA_DIGSIG_REQUIRED | IMA_MODSIG_ALLOWED | #ifdef CONFIG_IMA_DIGEST_LIST IMA_CHECK_BLACKLIST | IMA_VALIDATE_ALGOS | - IMA_META_IMMUTABLE_REQUIRED | IMA_PARSER)) + IMA_META_IMMUTABLE_REQUIRED)) #else IMA_CHECK_BLACKLIST | IMA_VALIDATE_ALGOS)) #endif @@ -1416,7 +1416,8 @@ static bool ima_validate_rule(struct ima_rule_entry *entry) IMA_FGROUP | IMA_DIGSIG_REQUIRED | IMA_PERMIT_DIRECTIO | IMA_VALIDATE_ALGOS | #ifdef CONFIG_IMA_DIGEST_LIST - IMA_VERITY_REQUIRED | IMA_META_IMMUTABLE_REQUIRED)) + IMA_VERITY_REQUIRED | + IMA_META_IMMUTABLE_REQUIRED | IMA_PARSER)) #else IMA_VERITY_REQUIRED)) #endif -- 2.33.0

2 1

[PATCH OLK-5.10] livepatch/core: Fix possible issue that old function is not checked
by Zheng Yejian 18 Sep '23

18 Sep '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZH67 CVE: NA -------------------------------- After patch being enabled, the first few instructions would be modified to jump to the new function, then callers of old function would jump to new function but always through the old function. Therefore when enabling a new patch or disable a patch on the old function, we should always consider that old function is running. Otherwise, there may be situations where old functions are being modified before jumping to new function and cause issues. Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com> --- arch/arm/kernel/livepatch.c | 24 +++++++++++++++++++++--- arch/arm64/kernel/livepatch.c | 24 +++++++++++++++++++++--- arch/powerpc/kernel/livepatch_32.c | 24 +++++++++++++++++++++--- arch/x86/kernel/livepatch.c | 25 +++++++++++++++++++++---- 4 files changed, 84 insertions(+), 13 deletions(-) diff --git a/arch/arm/kernel/livepatch.c b/arch/arm/kernel/livepatch.c index b4d26474ba33..b1711d947dfe 100644 --- a/arch/arm/kernel/livepatch.c +++ b/arch/arm/kernel/livepatch.c @@ -134,12 +134,17 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, struct klp_object *obj; struct klp_func_node *func_node; struct klp_func *func; - unsigned long func_addr, func_size; + unsigned long func_addr = 0; + unsigned long func_size; struct klp_func_list *pcheck = NULL; for (obj = patch->objs; obj->funcs; obj++) { for (func = obj->funcs; func->old_name; func++) { + unsigned long old_func = (unsigned long)func->old_func; + if (enable) { + bool need_check_old = false; + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* @@ -153,7 +158,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, * No patched on this function * [ the origin one ] */ - func_addr = (unsigned long)func->old_func; + func_addr = old_func; func_size = func->old_size; } else { /* @@ -184,6 +189,13 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func->old_name, func->force); if (ret) return ret; + need_check_old = (func_addr != old_func); + } + if (need_check_old) { + ret = add_func_to_list(check_funcs, &pcheck, old_func, + func->old_size, func->old_name, func->force); + if (ret) + return ret; } } else { /* @@ -203,7 +215,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, * the stack. */ if (list_is_singular(&func_node->func_stack)) { - func_addr = (unsigned long)func->old_func; + func_addr = old_func; func_size = func->old_size; } else { struct klp_func *prev; @@ -219,6 +231,12 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func->old_name, 0); if (ret) return ret; + if (func_addr != old_func) { + ret = add_func_to_list(check_funcs, &pcheck, old_func, + func->old_size, func->old_name, 0); + if (ret) + return ret; + } #endif func_addr = (unsigned long)func->new_func; func_size = func->new_size; diff --git a/arch/arm64/kernel/livepatch.c b/arch/arm64/kernel/livepatch.c index 6b5bcb491125..5b0171254820 100644 --- a/arch/arm64/kernel/livepatch.c +++ b/arch/arm64/kernel/livepatch.c @@ -126,13 +126,18 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, int ret; struct klp_object *obj; struct klp_func *func; - unsigned long func_addr, func_size; + unsigned long func_addr = 0; + unsigned long func_size; struct klp_func_node *func_node; struct klp_func_list *pcheck = NULL; for (obj = patch->objs; obj->funcs; obj++) { for (func = obj->funcs; func->old_name; func++) { + unsigned long old_func = (unsigned long)func->old_func; + if (enable) { + bool need_check_old = false; + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* @@ -142,7 +147,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func_node = klp_find_func_node(func->old_func); if (!func_node || list_empty(&func_node->func_stack)) { - func_addr = (unsigned long)func->old_func; + func_addr = old_func; func_size = func->old_size; } else { /* @@ -173,6 +178,13 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func->old_name, func->force); if (ret) return ret; + need_check_old = (func_addr != old_func); + } + if (need_check_old) { + ret = add_func_to_list(check_funcs, &pcheck, old_func, + func->old_size, func->old_name, func->force); + if (ret) + return ret; } } else { /* @@ -193,7 +205,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, * the stack. */ if (list_is_singular(&func_node->func_stack)) { - func_addr = (unsigned long)func->old_func; + func_addr = old_func; func_size = func->old_size; } else { struct klp_func *prev; @@ -209,6 +221,12 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func->old_name, 0); if (ret) return ret; + if (func_addr != old_func) { + ret = add_func_to_list(check_funcs, &pcheck, old_func, + func->old_size, func->old_name, 0); + if (ret) + return ret; + } #endif func_addr = (unsigned long)func->new_func; diff --git a/arch/powerpc/kernel/livepatch_32.c b/arch/powerpc/kernel/livepatch_32.c index 7b4ed23bf2ca..3fe4f3c5790b 100644 --- a/arch/powerpc/kernel/livepatch_32.c +++ b/arch/powerpc/kernel/livepatch_32.c @@ -123,13 +123,18 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, int ret; struct klp_object *obj; struct klp_func *func; - unsigned long func_addr, func_size; + unsigned long func_addr = 0; + unsigned long func_size; struct klp_func_node *func_node; struct klp_func_list *pcheck = NULL; for (obj = patch->objs; obj->funcs; obj++) { for (func = obj->funcs; func->old_name; func++) { + unsigned long old_func = (unsigned long)func->old_func; + if (enable) { + bool need_check_old = false; + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* @@ -143,7 +148,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, * No patched on this function * [ the origin one ] */ - func_addr = (unsigned long)func->old_func; + func_addr = old_func; func_size = func->old_size; } else { /* @@ -174,6 +179,13 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func->old_name, func->force); if (ret) return ret; + need_check_old = (func_addr != old_func); + } + if (need_check_old) { + ret = add_func_to_list(check_funcs, &pcheck, old_func, + func->old_size, func->old_name, func->force); + if (ret) + return ret; } } else { /* @@ -193,7 +205,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, * the stack. */ if (list_is_singular(&func_node->func_stack)) { - func_addr = (unsigned long)func->old_func; + func_addr = old_func; func_size = func->old_size; } else { struct klp_func *prev; @@ -208,6 +220,12 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func_size, func->old_name, 0); if (ret) return ret; + if (func_addr != old_func) { + ret = add_func_to_list(check_funcs, &pcheck, old_func, + func->old_size, func->old_name, 0); + if (ret) + return ret; + } #endif func_addr = (unsigned long)func->new_func; func_size = func->new_size; diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c index 0241e560bd2e..43404fc1fdbb 100644 --- a/arch/x86/kernel/livepatch.c +++ b/arch/x86/kernel/livepatch.c @@ -120,16 +120,20 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, int ret; struct klp_object *obj; struct klp_func *func; - unsigned long func_addr, func_size; + unsigned long func_addr = 0; + unsigned long func_size; struct klp_func_node *func_node = NULL; struct klp_func_list *pcheck = NULL; for (obj = patch->objs; obj->funcs; obj++) { for (func = obj->funcs; func->old_name; func++) { - func_node = klp_find_func_node(func->old_func); + unsigned long old_func = (unsigned long)func->old_func; + func_node = klp_find_func_node(func->old_func); /* Check func address in stack */ if (enable) { + bool need_check_old = false; + if (func->patched || func->force == KLP_ENFORCEMENT) continue; /* @@ -138,7 +142,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, */ if (!func_node || list_empty(&func_node->func_stack)) { - func_addr = (unsigned long)func->old_func; + func_addr = old_func; func_size = func->old_size; } else { /* @@ -169,6 +173,13 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func->old_name, func->force); if (ret) return ret; + need_check_old = (func_addr != old_func); + } + if (need_check_old) { + ret = add_func_to_list(check_funcs, &pcheck, old_func, + func->old_size, func->old_name, func->force); + if (ret) + return ret; } } else { /* @@ -186,7 +197,7 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, * the stack. */ if (list_is_singular(&func_node->func_stack)) { - func_addr = (unsigned long)func->old_func; + func_addr = old_func; func_size = func->old_size; } else { struct klp_func *prev; @@ -201,6 +212,12 @@ static int klp_check_activeness_func(struct klp_patch *patch, int enable, func_size, func->old_name, 0); if (ret) return ret; + if (func_addr != old_func) { + ret = add_func_to_list(check_funcs, &pcheck, old_func, + func->old_size, func->old_name, 0); + if (ret) + return ret; + } #endif func_addr = (unsigned long)func->new_func; -- 2.25.1

2 1

[PATCH openEuler-23.09] mm: gmem: Use find_vma_intersection to find overlap vma
by Wupeng Ma 18 Sep '23

18 Sep '23

From: Ma Wupeng <mawupeng1(a)huawei.com> euleros inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7WLVX --------------------------------------------- Use find_vma_intersection instead of find_vma to find overlapping vma. Fixes: 848492f233ce ("mm: gmem: Introduce vm_object for gmem") Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/mmap.c b/mm/mmap.c index eb24efdba25d..2aef07b8a85e 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2701,7 +2701,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, struct vm_area_struct *vma; if (gmem_is_enabled()) { - vma = find_vma(mm, start); + vma = find_vma_intersection(mm, start, start + len); if (!vma) return 0; if (vma_is_peer_shared(vma)) { -- 2.25.1

2 1

[PATCH openEuler-23.09] ima: fix the PGP certificate failure to load into the kernel
by Zhou Shuiqing 15 Sep '23

15 Sep '23

euleros inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I81RYU CVE: NA ------------------------------------------------- This patch is to fix the PGP certificate failure to load into the kernel, PGP certificates are used to verify the IMA digest list. Signed-off-by: Zhou Shuiqing <zhoushuiqing2(a)huawei.com> --- certs/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/certs/Makefile b/certs/Makefile index ab6da6f4e..49f8101cc 100644 --- a/certs/Makefile +++ b/certs/Makefile @@ -67,7 +67,7 @@ $(obj)/system_certificates.o: $(obj)/signing_key.x509 ifdef CONFIG_PGP_PRELOAD_PUBLIC_KEYS ifeq ($(shell ls $(srctree)/certs/pubring.gpg 2> /dev/null), $(srctree)/certs/pubring.gpg) -system_certificates.o += -DHAVE_PUBRING_GPG +AFLAGS_system_certificates.o += -DHAVE_PUBRING_GPG $(obj)/system_certificates.o: $(srctree)/certs/pubring.gpg endif endif -- 2.33.0

2 1

[PATCH openEuler-23.09] ima: fix the PGP certificate failure to load into the kernel
by Zhou Shuiqing 15 Sep '23

15 Sep '23

euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I81RYU CVE: NA ------------------------------------------------- This patch is to fix the PGP certificate failure to load into the kernel, PGP certificates are used to verify the IMA digest list. Signed-off-by: Zhou Shuiqing <zhoushuiqing2(a)huawei.com> --- certs/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/certs/Makefile b/certs/Makefile index ab6da6f4e..49f8101cc 100644 --- a/certs/Makefile +++ b/certs/Makefile @@ -67,7 +67,7 @@ $(obj)/system_certificates.o: $(obj)/signing_key.x509 ifdef CONFIG_PGP_PRELOAD_PUBLIC_KEYS ifeq ($(shell ls $(srctree)/certs/pubring.gpg 2> /dev/null), $(srctree)/certs/pubring.gpg) -system_certificates.o += -DHAVE_PUBRING_GPG +AFLAGS_system_certificates.o += -DHAVE_PUBRING_GPG $(obj)/system_certificates.o: $(srctree)/certs/pubring.gpg endif endif -- 2.33.0

2 1

[PATCH openEuler-1.0-LTS] sched/qos: Fix warning in CPU hotplug scenarios
by Xia Fukun 15 Sep '23

15 Sep '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZMCB CVE: NA -------------------------------- CPU hotplug callbacks race against distribute_cfs_runtime(), when the QOS_SCHED feature is enabled, there may be situations where the cfs_rq-> runtime_remaining == 1 and cfs_rq is QOS_THROTTLED. Turn off the Qos_throttle when the CPU is offline. No longer allocate time to cfs_rq in this scenario to fix the warning. Fixes: fbea24f5894e ("sched/qos: Don't unthrottle cfs_rq when cfs_rq is throttled by qos") Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> --- kernel/sched/fair.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e9afb1e6ca4c..1c78e2f29901 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4783,6 +4783,19 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b, u64 remaining) if (!cfs_rq_throttled(cfs_rq)) goto next; + /* + * CPU hotplug callbacks race against distribute_cfs_runtime() + * when the QOS_SCHED feature is enabled, there may be + * situations where the runtime_remaining > 0. + * Qos_sched does not care whether the cfs_rq has time left, + * so no longer allocate time to cfs_rq in this scenario. + */ +#ifdef CONFIG_QOS_SCHED + if (cfs_rq->throttled == QOS_THROTTLED && + cfs_rq->runtime_remaining > 0) + goto next; +#endif + /* By the above check, this should never be true */ SCHED_WARN_ON(cfs_rq->runtime_remaining > 0); @@ -7754,6 +7767,10 @@ static bool check_qos_cfs_rq(struct cfs_rq *cfs_rq) if (unlikely(cfs_rq && cfs_rq->tg->qos_level < 0 && !sched_idle_cpu(smp_processor_id()) && cfs_rq->h_nr_running == cfs_rq->idle_h_nr_running)) { + + if (!rq_of(cfs_rq)->online) + return false; + throttle_qos_cfs_rq(cfs_rq); return true; } -- 2.34.1

2 1

[PATCH OLK-5.10] ata: libahci: clear pending interrupt status
by Xingui Yang 15 Sep '23

15 Sep '23

From: Szuying Chen <chensiying21(a)gmail.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I81M63 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git/commit/?… ------------------------------------------------------------ When a CRC error occurs, the HBA asserts an interrupt to indicate an interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then does error recovery. If the adapter receives another SDB FIS with an error (PxIS.TFES) from the device before the start of the EH recovery process, the interrupt signaling the new SDB cannot be serviced as PxIE was cleared already. This in turn results in the HBA inability to issue any command during the error recovery process after setting PxCMD.ST to 1 because PxIS.TFES is still set. According to AHCI 1.3.1 specifications section 6.2.2, fatal errors notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will cause the HBA to enter the ERR:Fatal state. In this state, the HBA shall not issue any new commands. To avoid this situation, introduce the function ahci_port_clear_pending_irq() to clear pending interrupts before executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2 specification. Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw> Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset") Cc: stable(a)vger.kernel.org Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com> Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org> Signed-off-by: Xingui Yang <yangxingui(a)huawei.com> --- drivers/ata/libahci.c | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-) diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index 4514f3f28..160400a2a 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c @@ -1205,6 +1205,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf) return sprintf(buf, "%d\n", emp->blink_policy); } +static void ahci_port_clear_pending_irq(struct ata_port *ap) +{ + struct ahci_host_priv *hpriv = ap->host->private_data; + void __iomem *port_mmio = ahci_port_base(ap); + u32 tmp; + + /* clear SError */ + tmp = readl(port_mmio + PORT_SCR_ERR); + dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp); + writel(tmp, port_mmio + PORT_SCR_ERR); + + /* clear port IRQ */ + tmp = readl(port_mmio + PORT_IRQ_STAT); + dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp); + if (tmp) + writel(tmp, port_mmio + PORT_IRQ_STAT); + + writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT); +} + static void ahci_port_init(struct device *dev, struct ata_port *ap, int port_no, void __iomem *mmio, void __iomem *port_mmio) @@ -1219,18 +1239,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap, if (rc) dev_warn(dev, "%s (%d)\n", emsg, rc); - /* clear SError */ - tmp = readl(port_mmio + PORT_SCR_ERR); - VPRINTK("PORT_SCR_ERR 0x%x\n", tmp); - writel(tmp, port_mmio + PORT_SCR_ERR); - - /* clear port IRQ */ - tmp = readl(port_mmio + PORT_IRQ_STAT); - VPRINTK("PORT_IRQ_STAT 0x%x\n", tmp); - if (tmp) - writel(tmp, port_mmio + PORT_IRQ_STAT); - - writel(1 << port_no, mmio + HOST_IRQ_STAT); + ahci_port_clear_pending_irq(ap); /* mark esata ports */ tmp = readl(port_mmio + PORT_CMD); @@ -1560,6 +1569,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class, tf.command = ATA_BUSY; ata_tf_to_fis(&tf, 0, 0, d2h_fis); + ahci_port_clear_pending_irq(ap); + rc = sata_link_hardreset(link, timing, deadline, online, ahci_check_ready); -- 2.17.1

2 1

[PATCH OLK-5.10] net: sched: sch_qfq: Fix UAF in qfq_dequeue()
by Liu Jian 15 Sep '23

15 Sep '23

From: valis <sec(a)valis.email> mainline inclusion from mainline-v6.6-rc1 commit 8fc134fee27f2263988ae38920bc03da416b03d8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I80USB CVE: CVE-2023-4921 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- When the plug qdisc is used as a class of the qfq qdisc it could trigger a UAF. This issue can be reproduced with following commands: tc qdisc add dev lo root handle 1: qfq tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512 tc qdisc add dev lo parent 1:1 handle 2: plug tc filter add dev lo parent 1: basic classid 1:1 ping -c1 127.0.0.1 and boom: [ 285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0 [ 285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144 [ 285.355903] [ 285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4 [ 285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 285.358376] Call Trace: [ 285.358773] <IRQ> [ 285.359109] dump_stack_lvl+0x44/0x60 [ 285.359708] print_address_description.constprop.0+0x2c/0x3c0 [ 285.360611] kasan_report+0x10c/0x120 [ 285.361195] ? qfq_dequeue+0xa7/0x7f0 [ 285.361780] qfq_dequeue+0xa7/0x7f0 [ 285.362342] __qdisc_run+0xf1/0x970 [ 285.362903] net_tx_action+0x28e/0x460 [ 285.363502] __do_softirq+0x11b/0x3de [ 285.364097] do_softirq.part.0+0x72/0x90 [ 285.364721] </IRQ> [ 285.365072] <TASK> [ 285.365422] __local_bh_enable_ip+0x77/0x90 [ 285.366079] __dev_queue_xmit+0x95f/0x1550 [ 285.366732] ? __pfx_csum_and_copy_from_iter+0x10/0x10 [ 285.367526] ? __pfx___dev_queue_xmit+0x10/0x10 [ 285.368259] ? __build_skb_around+0x129/0x190 [ 285.368960] ? ip_generic_getfrag+0x12c/0x170 [ 285.369653] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 285.370390] ? csum_partial+0x8/0x20 [ 285.370961] ? raw_getfrag+0xe5/0x140 [ 285.371559] ip_finish_output2+0x539/0xa40 [ 285.372222] ? __pfx_ip_finish_output2+0x10/0x10 [ 285.372954] ip_output+0x113/0x1e0 [ 285.373512] ? __pfx_ip_output+0x10/0x10 [ 285.374130] ? icmp_out_count+0x49/0x60 [ 285.374739] ? __pfx_ip_finish_output+0x10/0x10 [ 285.375457] ip_push_pending_frames+0xf3/0x100 [ 285.376173] raw_sendmsg+0xef5/0x12d0 [ 285.376760] ? do_syscall_64+0x40/0x90 [ 285.377359] ? __static_call_text_end+0x136578/0x136578 [ 285.378173] ? do_syscall_64+0x40/0x90 [ 285.378772] ? kasan_enable_current+0x11/0x20 [ 285.379469] ? __pfx_raw_sendmsg+0x10/0x10 [ 285.380137] ? __sock_create+0x13e/0x270 [ 285.380673] ? __sys_socket+0xf3/0x180 [ 285.381174] ? __x64_sys_socket+0x3d/0x50 [ 285.381725] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.382425] ? __rcu_read_unlock+0x48/0x70 [ 285.382975] ? ip4_datagram_release_cb+0xd8/0x380 [ 285.383608] ? __pfx_ip4_datagram_release_cb+0x10/0x10 [ 285.384295] ? preempt_count_sub+0x14/0xc0 [ 285.384844] ? __list_del_entry_valid+0x76/0x140 [ 285.385467] ? _raw_spin_lock_bh+0x87/0xe0 [ 285.386014] ? __pfx__raw_spin_lock_bh+0x10/0x10 [ 285.386645] ? release_sock+0xa0/0xd0 [ 285.387148] ? preempt_count_sub+0x14/0xc0 [ 285.387712] ? freeze_secondary_cpus+0x348/0x3c0 [ 285.388341] ? aa_sk_perm+0x177/0x390 [ 285.388856] ? __pfx_aa_sk_perm+0x10/0x10 [ 285.389441] ? check_stack_object+0x22/0x70 [ 285.390032] ? inet_send_prepare+0x2f/0x120 [ 285.390603] ? __pfx_inet_sendmsg+0x10/0x10 [ 285.391172] sock_sendmsg+0xcc/0xe0 [ 285.391667] __sys_sendto+0x190/0x230 [ 285.392168] ? __pfx___sys_sendto+0x10/0x10 [ 285.392727] ? kvm_clock_get_cycles+0x14/0x30 [ 285.393328] ? set_normalized_timespec64+0x57/0x70 [ 285.393980] ? _raw_spin_unlock_irq+0x1b/0x40 [ 285.394578] ? __x64_sys_clock_gettime+0x11c/0x160 [ 285.395225] ? __pfx___x64_sys_clock_gettime+0x10/0x10 [ 285.395908] ? _copy_to_user+0x3e/0x60 [ 285.396432] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.397086] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.397734] ? do_syscall_64+0x71/0x90 [ 285.398258] __x64_sys_sendto+0x74/0x90 [ 285.398786] do_syscall_64+0x64/0x90 [ 285.399273] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.399949] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.400605] ? do_syscall_64+0x71/0x90 [ 285.401124] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.401807] RIP: 0033:0x495726 [ 285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09 [ 285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726 [ 285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000 [ 285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c [ 285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634 [ 285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000 [ 285.410403] </TASK> [ 285.410704] [ 285.410929] Allocated by task 144: [ 285.411402] kasan_save_stack+0x1e/0x40 [ 285.411926] kasan_set_track+0x21/0x30 [ 285.412442] __kasan_slab_alloc+0x55/0x70 [ 285.412973] kmem_cache_alloc_node+0x187/0x3d0 [ 285.413567] __alloc_skb+0x1b4/0x230 [ 285.414060] __ip_append_data+0x17f7/0x1b60 [ 285.414633] ip_append_data+0x97/0xf0 [ 285.415144] raw_sendmsg+0x5a8/0x12d0 [ 285.415640] sock_sendmsg+0xcc/0xe0 [ 285.416117] __sys_sendto+0x190/0x230 [ 285.416626] __x64_sys_sendto+0x74/0x90 [ 285.417145] do_syscall_64+0x64/0x90 [ 285.417624] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.418306] [ 285.418531] Freed by task 144: [ 285.418960] kasan_save_stack+0x1e/0x40 [ 285.419469] kasan_set_track+0x21/0x30 [ 285.419988] kasan_save_free_info+0x27/0x40 [ 285.420556] ____kasan_slab_free+0x109/0x1a0 [ 285.421146] kmem_cache_free+0x1c2/0x450 [ 285.421680] __netif_receive_skb_core+0x2ce/0x1870 [ 285.422333] __netif_receive_skb_one_core+0x97/0x140 [ 285.423003] process_backlog+0x100/0x2f0 [ 285.423537] __napi_poll+0x5c/0x2d0 [ 285.424023] net_rx_action+0x2be/0x560 [ 285.424510] __do_softirq+0x11b/0x3de [ 285.425034] [ 285.425254] The buggy address belongs to the object at ffff8880bad31280 [ 285.425254] which belongs to the cache skbuff_head_cache of size 224 [ 285.426993] The buggy address is located 40 bytes inside of [ 285.426993] freed 224-byte region [ffff8880bad31280, ffff8880bad31360) [ 285.428572] [ 285.428798] The buggy address belongs to the physical page: [ 285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31 [ 285.430758] flags: 0x100000000000200(slab|node=0|zone=1) [ 285.431447] page_type: 0xffffffff() [ 285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000 [ 285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 [ 285.433562] page dumped because: kasan: bad access detected [ 285.434144] [ 285.434320] Memory state around the buggy address: [ 285.434828] ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.435580] ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 285.436777] ^ [ 285.437106] ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 285.437616] ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.438126] ================================================================== [ 285.438662] Disabling lock debugging due to kernel taint Fix this by: 1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a function compatible with non-work-conserving qdiscs 2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq. Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Reported-by: valis <sec(a)valis.email> Signed-off-by: valis <sec(a)valis.email> Signed-off-by: Jamal Hadi Salim <jhs(a)mojatatu.com> Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Liu Jian <liujian56(a)huawei.com> --- net/sched/sch_plug.c | 2 +- net/sched/sch_qfq.c | 22 +++++++++++++++++----- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c index cbc2ebca4548..339990bb5981 100644 --- a/net/sched/sch_plug.c +++ b/net/sched/sch_plug.c @@ -210,7 +210,7 @@ static struct Qdisc_ops plug_qdisc_ops __read_mostly = { .priv_size = sizeof(struct plug_sched_data), .enqueue = plug_enqueue, .dequeue = plug_dequeue, - .peek = qdisc_peek_head, + .peek = qdisc_peek_dequeued, .init = plug_init, .change = plug_change, .reset = qdisc_reset_queue, diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 1aa9e71a1d76..9447f486141d 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -976,10 +976,13 @@ static void qfq_update_eligible(struct qfq_sched *q) } /* Dequeue head packet of the head class in the DRR queue of the aggregate. */ -static void agg_dequeue(struct qfq_aggregate *agg, - struct qfq_class *cl, unsigned int len) +static struct sk_buff *agg_dequeue(struct qfq_aggregate *agg, + struct qfq_class *cl, unsigned int len) { - qdisc_dequeue_peeked(cl->qdisc); + struct sk_buff *skb = qdisc_dequeue_peeked(cl->qdisc); + + if (!skb) + return NULL; cl->deficit -= (int) len; @@ -989,6 +992,8 @@ static void agg_dequeue(struct qfq_aggregate *agg, cl->deficit += agg->lmax; list_move_tail(&cl->alist, &agg->active); } + + return skb; } static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg, @@ -1134,11 +1139,18 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch) if (!skb) return NULL; - qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; + + skb = agg_dequeue(in_serv_agg, cl, len); + + if (!skb) { + sch->q.qlen++; + return NULL; + } + + qdisc_qstats_backlog_dec(sch, skb); qdisc_bstats_update(sch, skb); - agg_dequeue(in_serv_agg, cl, len); /* If lmax is lowered, through qfq_change_class, for a class * owning pending packets with larger size than the new value * of lmax, then the following condition may hold. -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] net: sched: sch_qfq: Fix UAF in qfq_dequeue()
by Liu Jian 15 Sep '23

15 Sep '23

From: valis <sec(a)valis.email> mainline inclusion from mainline-v6.6-rc1 commit 8fc134fee27f2263988ae38920bc03da416b03d8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I80USB CVE: CVE-2023-4921 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- When the plug qdisc is used as a class of the qfq qdisc it could trigger a UAF. This issue can be reproduced with following commands: tc qdisc add dev lo root handle 1: qfq tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512 tc qdisc add dev lo parent 1:1 handle 2: plug tc filter add dev lo parent 1: basic classid 1:1 ping -c1 127.0.0.1 and boom: [ 285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0 [ 285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144 [ 285.355903] [ 285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4 [ 285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 285.358376] Call Trace: [ 285.358773] <IRQ> [ 285.359109] dump_stack_lvl+0x44/0x60 [ 285.359708] print_address_description.constprop.0+0x2c/0x3c0 [ 285.360611] kasan_report+0x10c/0x120 [ 285.361195] ? qfq_dequeue+0xa7/0x7f0 [ 285.361780] qfq_dequeue+0xa7/0x7f0 [ 285.362342] __qdisc_run+0xf1/0x970 [ 285.362903] net_tx_action+0x28e/0x460 [ 285.363502] __do_softirq+0x11b/0x3de [ 285.364097] do_softirq.part.0+0x72/0x90 [ 285.364721] </IRQ> [ 285.365072] <TASK> [ 285.365422] __local_bh_enable_ip+0x77/0x90 [ 285.366079] __dev_queue_xmit+0x95f/0x1550 [ 285.366732] ? __pfx_csum_and_copy_from_iter+0x10/0x10 [ 285.367526] ? __pfx___dev_queue_xmit+0x10/0x10 [ 285.368259] ? __build_skb_around+0x129/0x190 [ 285.368960] ? ip_generic_getfrag+0x12c/0x170 [ 285.369653] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 285.370390] ? csum_partial+0x8/0x20 [ 285.370961] ? raw_getfrag+0xe5/0x140 [ 285.371559] ip_finish_output2+0x539/0xa40 [ 285.372222] ? __pfx_ip_finish_output2+0x10/0x10 [ 285.372954] ip_output+0x113/0x1e0 [ 285.373512] ? __pfx_ip_output+0x10/0x10 [ 285.374130] ? icmp_out_count+0x49/0x60 [ 285.374739] ? __pfx_ip_finish_output+0x10/0x10 [ 285.375457] ip_push_pending_frames+0xf3/0x100 [ 285.376173] raw_sendmsg+0xef5/0x12d0 [ 285.376760] ? do_syscall_64+0x40/0x90 [ 285.377359] ? __static_call_text_end+0x136578/0x136578 [ 285.378173] ? do_syscall_64+0x40/0x90 [ 285.378772] ? kasan_enable_current+0x11/0x20 [ 285.379469] ? __pfx_raw_sendmsg+0x10/0x10 [ 285.380137] ? __sock_create+0x13e/0x270 [ 285.380673] ? __sys_socket+0xf3/0x180 [ 285.381174] ? __x64_sys_socket+0x3d/0x50 [ 285.381725] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.382425] ? __rcu_read_unlock+0x48/0x70 [ 285.382975] ? ip4_datagram_release_cb+0xd8/0x380 [ 285.383608] ? __pfx_ip4_datagram_release_cb+0x10/0x10 [ 285.384295] ? preempt_count_sub+0x14/0xc0 [ 285.384844] ? __list_del_entry_valid+0x76/0x140 [ 285.385467] ? _raw_spin_lock_bh+0x87/0xe0 [ 285.386014] ? __pfx__raw_spin_lock_bh+0x10/0x10 [ 285.386645] ? release_sock+0xa0/0xd0 [ 285.387148] ? preempt_count_sub+0x14/0xc0 [ 285.387712] ? freeze_secondary_cpus+0x348/0x3c0 [ 285.388341] ? aa_sk_perm+0x177/0x390 [ 285.388856] ? __pfx_aa_sk_perm+0x10/0x10 [ 285.389441] ? check_stack_object+0x22/0x70 [ 285.390032] ? inet_send_prepare+0x2f/0x120 [ 285.390603] ? __pfx_inet_sendmsg+0x10/0x10 [ 285.391172] sock_sendmsg+0xcc/0xe0 [ 285.391667] __sys_sendto+0x190/0x230 [ 285.392168] ? __pfx___sys_sendto+0x10/0x10 [ 285.392727] ? kvm_clock_get_cycles+0x14/0x30 [ 285.393328] ? set_normalized_timespec64+0x57/0x70 [ 285.393980] ? _raw_spin_unlock_irq+0x1b/0x40 [ 285.394578] ? __x64_sys_clock_gettime+0x11c/0x160 [ 285.395225] ? __pfx___x64_sys_clock_gettime+0x10/0x10 [ 285.395908] ? _copy_to_user+0x3e/0x60 [ 285.396432] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.397086] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.397734] ? do_syscall_64+0x71/0x90 [ 285.398258] __x64_sys_sendto+0x74/0x90 [ 285.398786] do_syscall_64+0x64/0x90 [ 285.399273] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.399949] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.400605] ? do_syscall_64+0x71/0x90 [ 285.401124] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.401807] RIP: 0033:0x495726 [ 285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09 [ 285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726 [ 285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000 [ 285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c [ 285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634 [ 285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000 [ 285.410403] </TASK> [ 285.410704] [ 285.410929] Allocated by task 144: [ 285.411402] kasan_save_stack+0x1e/0x40 [ 285.411926] kasan_set_track+0x21/0x30 [ 285.412442] __kasan_slab_alloc+0x55/0x70 [ 285.412973] kmem_cache_alloc_node+0x187/0x3d0 [ 285.413567] __alloc_skb+0x1b4/0x230 [ 285.414060] __ip_append_data+0x17f7/0x1b60 [ 285.414633] ip_append_data+0x97/0xf0 [ 285.415144] raw_sendmsg+0x5a8/0x12d0 [ 285.415640] sock_sendmsg+0xcc/0xe0 [ 285.416117] __sys_sendto+0x190/0x230 [ 285.416626] __x64_sys_sendto+0x74/0x90 [ 285.417145] do_syscall_64+0x64/0x90 [ 285.417624] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.418306] [ 285.418531] Freed by task 144: [ 285.418960] kasan_save_stack+0x1e/0x40 [ 285.419469] kasan_set_track+0x21/0x30 [ 285.419988] kasan_save_free_info+0x27/0x40 [ 285.420556] ____kasan_slab_free+0x109/0x1a0 [ 285.421146] kmem_cache_free+0x1c2/0x450 [ 285.421680] __netif_receive_skb_core+0x2ce/0x1870 [ 285.422333] __netif_receive_skb_one_core+0x97/0x140 [ 285.423003] process_backlog+0x100/0x2f0 [ 285.423537] __napi_poll+0x5c/0x2d0 [ 285.424023] net_rx_action+0x2be/0x560 [ 285.424510] __do_softirq+0x11b/0x3de [ 285.425034] [ 285.425254] The buggy address belongs to the object at ffff8880bad31280 [ 285.425254] which belongs to the cache skbuff_head_cache of size 224 [ 285.426993] The buggy address is located 40 bytes inside of [ 285.426993] freed 224-byte region [ffff8880bad31280, ffff8880bad31360) [ 285.428572] [ 285.428798] The buggy address belongs to the physical page: [ 285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31 [ 285.430758] flags: 0x100000000000200(slab|node=0|zone=1) [ 285.431447] page_type: 0xffffffff() [ 285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000 [ 285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 [ 285.433562] page dumped because: kasan: bad access detected [ 285.434144] [ 285.434320] Memory state around the buggy address: [ 285.434828] ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.435580] ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 285.436777] ^ [ 285.437106] ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 285.437616] ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.438126] ================================================================== [ 285.438662] Disabling lock debugging due to kernel taint Fix this by: 1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a function compatible with non-work-conserving qdiscs 2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq. Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Reported-by: valis <sec(a)valis.email> Signed-off-by: valis <sec(a)valis.email> Signed-off-by: Jamal Hadi Salim <jhs(a)mojatatu.com> Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Liu Jian <liujian56(a)huawei.com> --- net/sched/sch_plug.c | 2 +- net/sched/sch_qfq.c | 22 +++++++++++++++++----- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c index 5619d2eb17b6..4ddb4af61d10 100644 --- a/net/sched/sch_plug.c +++ b/net/sched/sch_plug.c @@ -214,7 +214,7 @@ static struct Qdisc_ops plug_qdisc_ops __read_mostly = { .priv_size = sizeof(struct plug_sched_data), .enqueue = plug_enqueue, .dequeue = plug_dequeue, - .peek = qdisc_peek_head, + .peek = qdisc_peek_dequeued, .init = plug_init, .change = plug_change, .reset = qdisc_reset_queue, diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index a08579a5f75e..5b7b149a2b9f 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -989,10 +989,13 @@ static void qfq_update_eligible(struct qfq_sched *q) } /* Dequeue head packet of the head class in the DRR queue of the aggregate. */ -static void agg_dequeue(struct qfq_aggregate *agg, - struct qfq_class *cl, unsigned int len) +static struct sk_buff *agg_dequeue(struct qfq_aggregate *agg, + struct qfq_class *cl, unsigned int len) { - qdisc_dequeue_peeked(cl->qdisc); + struct sk_buff *skb = qdisc_dequeue_peeked(cl->qdisc); + + if (!skb) + return NULL; cl->deficit -= (int) len; @@ -1002,6 +1005,8 @@ static void agg_dequeue(struct qfq_aggregate *agg, cl->deficit += agg->lmax; list_move_tail(&cl->alist, &agg->active); } + + return skb; } static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg, @@ -1147,11 +1152,18 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch) if (!skb) return NULL; - qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; + + skb = agg_dequeue(in_serv_agg, cl, len); + + if (!skb) { + sch->q.qlen++; + return NULL; + } + + qdisc_qstats_backlog_dec(sch, skb); qdisc_bstats_update(sch, skb); - agg_dequeue(in_serv_agg, cl, len); /* If lmax is lowered, through qfq_change_class, for a class * owning pending packets with larger size than the new value * of lmax, then the following condition may hold. -- 2.34.1

2 1

[PATCH openEuler-23.09] mm: gmem: Check overflow for prefetch/eagerfree
by Wupeng Ma 15 Sep '23

15 Sep '23

From: Ma Wupeng <mawupeng1(a)huawei.com> euleros inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7WLVX --------------------------------------------- Add overflow check for gmem prefetch/eagerfree. Fixes: 3e01aec2b2e8 ("mm: gmem: Introduce hmadvise") Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> --- mm/gmem.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/mm/gmem.c b/mm/gmem.c index a710869d04a9..90a5b5fda284 100644 --- a/mm/gmem.c +++ b/mm/gmem.c @@ -622,12 +622,25 @@ static int hmadvise_do_prefetch(gm_dev_t *dev, unsigned long addr, size_t size) struct prefetch_data *data; struct vm_area_struct *vma; int res = GM_RET_SUCCESS; + unsigned long old_start; + + /* overflow */ + if (check_add_overflow(addr, size, &end)) + return -EINVAL; + + old_start = end; /* Align addr by rounding outward to make page cover addr. */ - end = round_up(addr + size, page_size); + end = round_up(end, page_size); start = round_down(addr, page_size); size = end - start; + if (!end && old_start) + return -EINVAL; + + if (size == 0) + return 0; + mmap_read_lock(current->mm); vma = find_vma(current->mm, start); if (!vma || start < vma->vm_start || end > vma->vm_end) { @@ -675,19 +688,30 @@ static int hmadvise_do_eagerfree(unsigned long addr, size_t size) .size = page_size, .copy = false, }; + unsigned long old_start; vm_object_t *obj; + /* overflow */ + if (check_add_overflow(addr, size, &end)) + return -EINVAL; + + old_start = addr; + /* Align addr by rounding inward to avoid excessive page release. */ - end = round_down(addr + size, page_size); + end = round_down(end, page_size); start = round_up(addr, page_size); if (start >= end) return ret; + /* Check to see whether len was rounded up from small -ve to zero */ + if (old_start && !start) + return -EINVAL; + mmap_read_lock(current->mm); do { vma = find_vma(current->mm, start); if (!vma || !vma_is_peer_shared(vma)) { - pr_err("gmem: not peer-shared vma, skip dontneed\n"); + pr_info_ratelimited("gmem: not peer-shared vma, skip dontneed\n"); continue; } obj = vma->vm_obj; -- 2.25.1

2 1

[PATCH openEuler-23.09] kernel: Fix compile failure out of the srctree
by Wang Wensheng 14 Sep '23

14 Sep '23

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I81E6X --------------------------------------------- Build the kernel Image out of the srctree with the following command: $ mkdir build $ cd build $ make -C ../ O=`pwd` openeuler_defconfig $ make Image -j64 would get the following error message: ../Makefile:1315: Makefile.oever: No such file or directory make: *** No rule to make target 'Makefile.oever'. Stop. Add a directory path for the included file to fix this. Fixes: 0177b043eb4e ("kernel: add OPENEULER_VERSION_CODE to version.h") Signed-off-by: Wang Wensheng <wangwensheng4(a)huawei.com> --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 5b2b3ad42a2d..5061718274f0 100644 --- a/Makefile +++ b/Makefile @@ -1312,7 +1312,7 @@ uapi-asm-generic: # --------------------------------------------------------------------------- # openEuler version variables -include Makefile.oever +include $(srctree)/Makefile.oever # KERNELRELEASE can change from a few different places, meaning version.h # needs to be updated, so this check is forced on all builds -- 2.17.1

2 1

[PATCH OLK-5.10 0/2] Fix the two problems when using binutil 2.41.
by Hongchen Zhang 14 Sep '23

14 Sep '23

LoongArch: Fix the write_fcsr() macro LoongArch: Fix module relocation error with binutils 2.41 arch/loongarch/Makefile | 2 ++ arch/loongarch/include/asm/loongarch.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) -- 2.33.0

2 3

[PATCH OLK-5.10] zram: do not waste zram_table_entry flags bits
by Jinjiang Tu 14 Sep '23

14 Sep '23

From: Sergey Senozhatsky <senozhatsky(a)chromium.org> mainline inclusion from mainline-v6.1-rc1 commit f635725c3905e755a8c3e2dc8cab7fcd0d38977f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7TWVA CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… ------------------------------------------- zram_table_entry::flags stores object size in the lower bits and zram pageflags in the upper bits. However, for some reason, we use 24 lower bits, while maximum zram object size is PAGE_SIZE, which requires PAGE_SHIFT bits (up to 16 on arm64). This wastes 24 - PAGE_SHIFT bits that we can use for additional zram pageflags instead. Also add a BUILD_BUG_ON() to alert us should we run out of bits in zram_table_entry::flags. Link: https://lkml.kernel.org/r/20220912152744.527438-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Reviewed-by: Brian Geffon <bgeffon(a)google.com> Acked-by: Minchan Kim <minchan(a)kernel.org> Cc: Nitin Gupta <ngupta(a)vflare.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- drivers/block/zram/zram_drv.c | 2 ++ drivers/block/zram/zram_drv.h | 15 +++++++-------- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 24a5f892d64d..e332b4d55359 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2591,6 +2591,8 @@ static int __init zram_init(void) { int ret; + BUILD_BUG_ON(__NR_ZRAM_PAGEFLAGS > BITS_PER_LONG); + ret = cpuhp_setup_state_multi(CPUHP_ZCOMP_PREPARE, "block/zram:prepare", zcomp_cpu_up_prepare, zcomp_cpu_dead); if (ret < 0) diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index b5b94e6b6ec8..eb13d0299f89 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -30,16 +30,15 @@ /* - * The lower ZRAM_FLAG_SHIFT bits of table.flags is for - * object size (excluding header), the higher bits is for - * zram_pageflags. + * ZRAM is mainly used for memory efficiency so we want to keep memory + * footprint small and thus squeeze size and zram pageflags into a flags + * member. The lower ZRAM_FLAG_SHIFT bits is for object size (excluding + * header), which cannot be larger than PAGE_SIZE (requiring PAGE_SHIFT + * bits), the higher bits are for zram_pageflags. * - * zram is mainly used for memory efficiency so we want to keep memory - * footprint small so we can squeeze size and flags into a field. - * The lower ZRAM_FLAG_SHIFT bits is for object size (excluding header), - * the higher bits is for zram_pageflags. + * We use BUILD_BUG_ON() to make sure that zram pageflags don't overflow. */ -#define ZRAM_FLAG_SHIFT 24 +#define ZRAM_FLAG_SHIFT (PAGE_SHIFT + 1) /* Only 2 bits are allowed for comp priority index */ #define ZRAM_COMP_PRIORITY_MASK 0x3 -- 2.25.1

2 1

[PATCH OLK-5.10 0/2] Fix the two problems when using binutil 2.41.
by Hongchen Zhang 14 Sep '23

14 Sep '23

LoongArch: Fix the write_fcsr() macro LoongArch: Fix module relocation error with binutils 2.41 arch/loongarch/Makefile | 2 ++ arch/loongarch/include/asm/loongarch.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) -- 2.33.0

2 3

[PATCH OLK-5.10 0/2] Fix the two problems when using binutil 2.41.
by Hongchen Zhang 13 Sep '23

13 Sep '23

LoongArch: Fix the write_fcsr() macro LoongArch: Fix module relocation error with binutils 2.41 arch/loongarch/Makefile | 2 ++ arch/loongarch/include/asm/loongarch.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) -- 2.33.0

2 3

[PATCH OLK-5.10 0/2] Fix the two problems when using binutil 2.41.
by Hongchen Zhang 13 Sep '23

13 Sep '23

LoongArch: Fix the write_fcsr() macro LoongArch: Fix module relocation error with binutils 2.41 arch/loongarch/Makefile | 2 ++ arch/loongarch/include/asm/loongarch.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) -- 2.33.0

2 3

[PATCH OLK-5.10] drm: add inspur drm driver support
by Hongchen Zhang 13 Sep '23

13 Sep '23

LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I80YFC ------------------------------------------ add drm support for Inspur BMC. Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> --- arch/loongarch/configs/loongson3_defconfig | 4 +- drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/inspur/Kconfig | 11 + drivers/gpu/drm/inspur/Makefile | 5 + drivers/gpu/drm/inspur/inspur_cursor.c | 58 +++ drivers/gpu/drm/inspur/inspur_drm_de.c | 513 +++++++++++++++++++++ drivers/gpu/drm/inspur/inspur_drm_drv.c | 456 ++++++++++++++++++ drivers/gpu/drm/inspur/inspur_drm_drv.h | 116 +++++ drivers/gpu/drm/inspur/inspur_drm_regs.h | 223 +++++++++ drivers/gpu/drm/inspur/inspur_drm_vdac.c | 117 +++++ drivers/gpu/drm/inspur/inspur_ttm.c | 36 ++ 12 files changed, 1539 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/inspur/Kconfig create mode 100644 drivers/gpu/drm/inspur/Makefile create mode 100644 drivers/gpu/drm/inspur/inspur_cursor.c create mode 100644 drivers/gpu/drm/inspur/inspur_drm_de.c create mode 100644 drivers/gpu/drm/inspur/inspur_drm_drv.c create mode 100644 drivers/gpu/drm/inspur/inspur_drm_drv.h create mode 100644 drivers/gpu/drm/inspur/inspur_drm_regs.h create mode 100644 drivers/gpu/drm/inspur/inspur_drm_vdac.c create mode 100644 drivers/gpu/drm/inspur/inspur_ttm.c diff --git a/arch/loongarch/configs/loongson3_defconfig b/arch/loongarch/configs/loongson3_defconfig index 6e0adea947f5..ec53e95bf30d 100644 --- a/arch/loongarch/configs/loongson3_defconfig +++ b/arch/loongarch/configs/loongson3_defconfig @@ -386,7 +386,6 @@ CONFIG_IP6_NF_SECURITY=m CONFIG_IP6_NF_NAT=m CONFIG_IP6_NF_TARGET_MASQUERADE=m CONFIG_IP6_NF_TARGET_NPT=m -CONFIG_DECNET_NF_GRABULATOR=m CONFIG_NF_TABLES_BRIDGE=m CONFIG_NFT_BRIDGE_META=m CONFIG_NFT_BRIDGE_REJECT=m @@ -458,8 +457,6 @@ CONFIG_NET_DSA_TAG_SJA1105=m CONFIG_NET_DSA_TAG_TRAILER=m CONFIG_VLAN_8021Q_GVRP=y CONFIG_VLAN_8021Q_MVRP=y -CONFIG_DECNET=m -CONFIG_DECNET_ROUTER=y CONFIG_LLC2=m CONFIG_ATALK=m CONFIG_DEV_APPLETALK=m @@ -1504,6 +1501,7 @@ CONFIG_DRM_NOUVEAU=m CONFIG_DRM_VKMS=m CONFIG_DRM_UDL=m CONFIG_DRM_AST=y +CONFIG_DRM_INSPUR=m CONFIG_DRM_MGAG200=m CONFIG_DRM_QXL=m CONFIG_DRM_BOCHS=m diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index b37e6660dd4e..f6dcb60be551 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -315,6 +315,8 @@ source "drivers/gpu/drm/ast/Kconfig" source "drivers/gpu/drm/loongson/Kconfig" +source "drivers/gpu/drm/inspur/Kconfig" + source "drivers/gpu/drm/mgag200/Kconfig" source "drivers/gpu/drm/armada/Kconfig" diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index e9dd6847c9fa..e806bda8650a 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -125,3 +125,4 @@ obj-$(CONFIG_DRM_ASPEED_GFX) += aspeed/ obj-$(CONFIG_DRM_MCDE) += mcde/ obj-$(CONFIG_DRM_TIDSS) += tidss/ obj-y += xlnx/ +obj-$(CONFIG_DRM_INSPUR) += inspur/ diff --git a/drivers/gpu/drm/inspur/Kconfig b/drivers/gpu/drm/inspur/Kconfig new file mode 100644 index 000000000000..7c9ab5ad77ab --- /dev/null +++ b/drivers/gpu/drm/inspur/Kconfig @@ -0,0 +1,11 @@ +config DRM_INSPUR + tristate "DRM Support for Inspur BMC" + depends on DRM && PCI && MMU + select DRM_KMS_HELPER + select DRM_VRAM_HELPER + + help + Choose this option if you have a Inspur soc chipset. + If M is selected the module will be called inspur-drm. + IF you use gnome3, please set "WaylandEnable=false" in + "vim /etc/gdm3/custom.conf" and reboot. diff --git a/drivers/gpu/drm/inspur/Makefile b/drivers/gpu/drm/inspur/Makefile new file mode 100644 index 000000000000..31a5bfe79214 --- /dev/null +++ b/drivers/gpu/drm/inspur/Makefile @@ -0,0 +1,5 @@ + +inspur-drm-y := inspur_drm_drv.o inspur_drm_de.o \ + inspur_drm_vdac.o inspur_ttm.o inspur_cursor.o + +obj-$(CONFIG_DRM_INSPUR) += inspur-drm.o diff --git a/drivers/gpu/drm/inspur/inspur_cursor.c b/drivers/gpu/drm/inspur/inspur_cursor.c new file mode 100644 index 000000000000..e84136cbf4f7 --- /dev/null +++ b/drivers/gpu/drm/inspur/inspur_cursor.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <linux/pci.h> +#include "inspur_drm_drv.h" +#include "inspur_drm_regs.h" + +void colorcur2monocur(void *data, void *out) +{ + unsigned int *col = (unsigned int *)data; + unsigned char *mono = (unsigned char *)out; + unsigned char pixel = 0; + char bit_values; + int i; + + for (i = 0; i < 64 * 64; i++) { + if (*col >> 24 < 0xe0) { + bit_values = 0; + } else { + int val = *col & 0xff; + + if (val < 0x80) + bit_values = 1; + else + bit_values = 2; + } + col++; + /* Copy bits into cursor byte */ + switch (i & 3) { + case 0: + pixel = bit_values; + break; + + case 1: + pixel |= bit_values << 2; + break; + + case 2: + pixel |= bit_values << 4; + break; + + case 3: + pixel |= bit_values << 6; + *mono = pixel; + mono++; + pixel = 0; + break; + } + } +} + +#define HW_FLAG_OFFSET 0x01ffff00 +#define HW_FLAG_ENABLE 0x1bd40750 +unsigned char getKVMHWCursorSetting(struct inspur_drm_private *priv) +{ + unsigned int value = *(unsigned int *)(priv->fb_map + HW_FLAG_OFFSET); + + DRM_DEBUG_KMS("HW_FLAG = %x\n", value); + return 0; +} diff --git a/drivers/gpu/drm/inspur/inspur_drm_de.c b/drivers/gpu/drm/inspur/inspur_drm_de.c new file mode 100644 index 000000000000..de31bb79129b --- /dev/null +++ b/drivers/gpu/drm/inspur/inspur_drm_de.c @@ -0,0 +1,513 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* INSPUR SoC drm driver + * + * Based on the smi drm driver. + * + * Copyright (c) 2020 SMI Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include <drm/drm_atomic_helper.h> +#include <drm/drm_plane_helper.h> +#include <drm/drm_probe_helper.h> +#include <drm/drm_fourcc.h> + +#include "inspur_drm_drv.h" +#include "inspur_drm_regs.h" + +struct inspur_dislay_pll_config { + unsigned long hdisplay; + unsigned long vdisplay; + u32 pll1_config_value; + u32 pll2_config_value; +}; + +static const struct inspur_dislay_pll_config inspur_pll_table[] = { + {640, 480, CRT_PLL1_NS_25MHZ, CRT_PLL2_NS_25MHZ}, + {800, 600, CRT_PLL1_NS_40MHZ, CRT_PLL2_NS_40MHZ}, + {1024, 768, CRT_PLL1_NS_65MHZ, CRT_PLL2_NS_65MHZ}, + {1280, 800, CRT_PLL1_NS_83MHZ, CRT_PLL2_NS_83MHZ}, + {1280, 1024, CRT_PLL1_NS_108MHZ, CRT_PLL2_NS_108MHZ}, + {1440, 900, CRT_PLL1_NS_106MHZ, CRT_PLL2_NS_106MHZ}, + {1680, 1050, CRT_PLL1_NS_146MHZ, CRT_PLL2_NS_146MHZ}, + {1920, 1080, CRT_PLL1_NS_148MHZ, CRT_PLL2_NS_148MHZ}, + {1920, 1200, CRT_PLL1_NS_193MHZ, CRT_PLL2_NS_193MHZ}, +}; + +#define PADDING(align, data) (((data) + (align) - 1) & (~((align) - 1))) + +static int inspur_plane_atomic_check(struct drm_plane *plane, + struct drm_plane_state *state) +{ + struct drm_framebuffer *fb = state->fb; + struct drm_crtc *crtc = state->crtc; + struct drm_crtc_state *crtc_state; + u32 src_w = state->src_w >> 16; + u32 src_h = state->src_h >> 16; + + if (!crtc || !fb) + return 0; + + crtc_state = drm_atomic_get_crtc_state(state->state, crtc); + if (IS_ERR(crtc_state)) + return PTR_ERR(crtc_state); + + if (src_w != state->crtc_w || src_h != state->crtc_h) { + DRM_DEBUG_ATOMIC("scale not support\n"); + return -EINVAL; + } + + if (state->crtc_x < 0 || state->crtc_y < 0) { + DRM_DEBUG_ATOMIC("crtc_x/y of drm_plane state is invalid\n"); + return -EINVAL; + } + + if (!crtc_state->enable) + return 0; + + if (state->crtc_x + state->crtc_w > + crtc_state->adjusted_mode.hdisplay || + state->crtc_y + state->crtc_h > + crtc_state->adjusted_mode.vdisplay) { + DRM_DEBUG_ATOMIC("visible portion of plane is invalid\n"); + return -EINVAL; + } + + if (state->fb->pitches[0] % 128 != 0) { + DRM_DEBUG_ATOMIC("wrong stride with 128-byte aligned\n"); + return -EINVAL; + } + + return 0; +} + +static void inspur_plane_atomic_update(struct drm_plane *plane, + struct drm_plane_state *old_state) +{ + struct drm_plane_state *state = plane->state; + u32 reg; + int ret; + s64 gpu_addr = 0; + unsigned int line_l; + struct inspur_drm_private *priv = plane->dev->dev_private; + struct drm_gem_vram_object *gbo; + + if (!state->fb) + return; + + gbo = drm_gem_vram_of_gem(state->fb->obj[0]); + + ret = drm_gem_vram_pin(gbo, DRM_GEM_VRAM_PL_FLAG_VRAM); + if (ret) { + DRM_ERROR("failed to pin bo: %d", ret); + return; + } + gpu_addr = drm_gem_vram_offset(gbo); + if (gpu_addr < 0) { + drm_gem_vram_unpin(gbo); + return; + } + + writel(gpu_addr, priv->mmio + INSPUR_CRT_FB_ADDRESS); + + reg = state->fb->width * (state->fb->format->cpp[0]); + + line_l = state->fb->pitches[0]; + writel(INSPUR_FIELD(INSPUR_CRT_FB_WIDTH_WIDTH, reg) | + INSPUR_FIELD(INSPUR_CRT_FB_WIDTH_OFFS, line_l), + priv->mmio + INSPUR_CRT_FB_WIDTH); + + /* SET PIXEL FORMAT */ + reg = readl(priv->mmio + INSPUR_CRT_DISP_CTL); + reg &= ~INSPUR_CRT_DISP_CTL_FORMAT_MASK; + reg |= INSPUR_FIELD(INSPUR_CRT_DISP_CTL_FORMAT, + state->fb->format->cpp[0] * 8 / 16); + writel(reg, priv->mmio + INSPUR_CRT_DISP_CTL); +} + +static const u32 channel_formats1[] = { + DRM_FORMAT_RGB565, DRM_FORMAT_BGR565, DRM_FORMAT_RGB888, + DRM_FORMAT_BGR888, DRM_FORMAT_XRGB8888, DRM_FORMAT_XBGR8888, + DRM_FORMAT_RGBA8888, DRM_FORMAT_BGRA8888, DRM_FORMAT_ARGB8888, + DRM_FORMAT_ABGR8888 +}; + +static struct drm_plane_funcs inspur_plane_funcs = { + .update_plane = drm_atomic_helper_update_plane, + .disable_plane = drm_atomic_helper_disable_plane, + .destroy = drm_plane_cleanup, + .reset = drm_atomic_helper_plane_reset, + .atomic_duplicate_state = drm_atomic_helper_plane_duplicate_state, + .atomic_destroy_state = drm_atomic_helper_plane_destroy_state, +}; + +static const struct drm_plane_helper_funcs inspur_plane_helper_funcs = { + .atomic_check = inspur_plane_atomic_check, + .atomic_update = inspur_plane_atomic_update, +}; + +static struct drm_plane *inspur_plane_init(struct inspur_drm_private *priv) +{ + struct drm_device *dev = priv->dev; + struct drm_plane *plane; + int ret = 0; + + plane = devm_kzalloc(dev->dev, sizeof(*plane), GFP_KERNEL); + if (!plane) { + DRM_ERROR("failed to alloc memory when init plane\n"); + return ERR_PTR(-ENOMEM); + } + ret = drm_universal_plane_init(dev, plane, 1, &inspur_plane_funcs, + channel_formats1, + ARRAY_SIZE(channel_formats1), + NULL, + DRM_PLANE_TYPE_PRIMARY, + NULL); + if (ret) { + DRM_ERROR("failed to init plane: %d\n", ret); + return ERR_PTR(ret); + } + + drm_plane_helper_add(plane, &inspur_plane_helper_funcs); + return plane; +} + +static void inspur_crtc_dpms(struct drm_crtc *crtc, int dpms) +{ + struct inspur_drm_private *priv = crtc->dev->dev_private; + unsigned int reg; + + reg = readl(priv->mmio + INSPUR_CRT_DISP_CTL); + reg &= ~INSPUR_CRT_DISP_CTL_DPMS_MASK; + reg |= INSPUR_FIELD(INSPUR_CRT_DISP_CTL_DPMS, dpms); + reg &= ~INSPUR_CRT_DISP_CTL_TIMING_MASK; + if (dpms == INSPUR_CRT_DPMS_ON) + reg |= INSPUR_CRT_DISP_CTL_TIMING(1); + writel(reg, priv->mmio + INSPUR_CRT_DISP_CTL); +} + + +static void inspur_crtc_atomic_enable(struct drm_crtc *crtc, +#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 11, 0) + struct drm_atomic_state *state) +#else + struct drm_crtc_state *old_state) +#endif +{ + unsigned int reg; + struct inspur_drm_private *priv = crtc->dev->dev_private; + + inspur_set_power_mode(priv, INSPUR_PW_MODE_CTL_MODE_MODE0); + + /* Enable display power gate & LOCALMEM power gate*/ + reg = readl(priv->mmio + INSPUR_CURRENT_GATE); + reg &= ~INSPUR_CURR_GATE_LOCALMEM_MASK; + reg &= ~INSPUR_CURR_GATE_DISPLAY_MASK; + reg |= INSPUR_CURR_GATE_LOCALMEM(1); + reg |= INSPUR_CURR_GATE_DISPLAY(1); + inspur_set_current_gate(priv, reg); + inspur_crtc_dpms(crtc, INSPUR_CRT_DPMS_ON); +} + +static void inspur_crtc_atomic_disable(struct drm_crtc *crtc, +#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 11, 0) + struct drm_atomic_state *state) +#else + struct drm_crtc_state *old_state) +#endif +{ + unsigned int reg; + struct inspur_drm_private *priv = crtc->dev->dev_private; + + inspur_crtc_dpms(crtc, INSPUR_CRT_DPMS_OFF); + + inspur_set_power_mode(priv, INSPUR_PW_MODE_CTL_MODE_SLEEP); + + /* Enable display power gate & LOCALMEM power gate*/ + reg = readl(priv->mmio + INSPUR_CURRENT_GATE); + reg &= ~INSPUR_CURR_GATE_LOCALMEM_MASK; + reg &= ~INSPUR_CURR_GATE_DISPLAY_MASK; + reg |= INSPUR_CURR_GATE_LOCALMEM(0); + reg |= INSPUR_CURR_GATE_DISPLAY(0); + inspur_set_current_gate(priv, reg); +} + +static enum drm_mode_status +inspur_crtc_mode_valid(struct drm_crtc *crtc, + const struct drm_display_mode *mode) +{ + int i = 0; + int vrefresh = drm_mode_vrefresh(mode); + + if (vrefresh < 59 || vrefresh > 61) + return MODE_NOCLOCK; + + for (i = 0; i < ARRAY_SIZE(inspur_pll_table); i++) { + if (inspur_pll_table[i].hdisplay == mode->hdisplay && + inspur_pll_table[i].vdisplay == mode->vdisplay) + return MODE_OK; + } + + return MODE_BAD; +} + +static void set_vclock_inspur(struct drm_device *dev, unsigned long pll) +{ + u32 val; + struct inspur_drm_private *priv = dev->dev_private; + + val = readl(priv->mmio + CRT_PLL1_NS); + val &= ~(CRT_PLL1_NS_OUTER_BYPASS(1)); + writel(val, priv->mmio + CRT_PLL1_NS); + + val = CRT_PLL1_NS_INTER_BYPASS(1) | CRT_PLL1_NS_POWERON(1); + writel(val, priv->mmio + CRT_PLL1_NS); + + writel(pll, priv->mmio + CRT_PLL1_NS); + + usleep_range(1000, 2000); + + val = pll & ~(CRT_PLL1_NS_POWERON(1)); + writel(val, priv->mmio + CRT_PLL1_NS); + + usleep_range(1000, 2000); + + val &= ~(CRT_PLL1_NS_INTER_BYPASS(1)); + writel(val, priv->mmio + CRT_PLL1_NS); + + usleep_range(1000, 2000); + + val |= CRT_PLL1_NS_OUTER_BYPASS(1); + writel(val, priv->mmio + CRT_PLL1_NS); +} + +static void get_pll_config(unsigned long x, unsigned long y, + u32 *pll1, u32 *pll2) +{ + int i; + int count = ARRAY_SIZE(inspur_pll_table); + + for (i = 0; i < count; i++) { + if (inspur_pll_table[i].hdisplay == x && + inspur_pll_table[i].vdisplay == y) { + *pll1 = inspur_pll_table[i].pll1_config_value; + *pll2 = inspur_pll_table[i].pll2_config_value; + return; + } + } + + /* if found none, we use default value */ + *pll1 = CRT_PLL1_NS_25MHZ; + *pll2 = CRT_PLL2_NS_25MHZ; +} + +/* + * This function takes care the extra registers and bit fields required to + * setup a mode in board. + * Explanation about Display Control register: + * FPGA only supports 7 predefined pixel clocks, and clock select is + * in bit 4:0 of new register 0x802a8. + */ +static unsigned int display_ctrl_adjust(struct drm_device *dev, + struct drm_display_mode *mode, + unsigned int ctrl) +{ + unsigned long x, y; + u32 pll1; /* bit[31:0] of PLL */ + u32 pll2; /* bit[63:32] of PLL */ + struct inspur_drm_private *priv = dev->dev_private; + + x = mode->hdisplay; + y = mode->vdisplay; + + get_pll_config(x, y, &pll1, &pll2); + writel(pll2, priv->mmio + CRT_PLL2_NS); + set_vclock_inspur(dev, pll1); + + /* + * inspur has to set up the top-left and bottom-right + * registers as well. + * Note that normal chip only use those two register for + * auto-centering mode. + */ + writel(INSPUR_FIELD(INSPUR_CRT_AUTO_CENTERING_TL_TOP, 0) | + INSPUR_FIELD(INSPUR_CRT_AUTO_CENTERING_TL_LEFT, 0), + priv->mmio + INSPUR_CRT_AUTO_CENTERING_TL); + + writel(INSPUR_FIELD(INSPUR_CRT_AUTO_CENTERING_BR_BOTTOM, y - 1) | + INSPUR_FIELD(INSPUR_CRT_AUTO_CENTERING_BR_RIGHT, x - 1), + priv->mmio + INSPUR_CRT_AUTO_CENTERING_BR); + + /* + * Assume common fields in ctrl have been properly set before + * calling this function. + * This function only sets the extra fields in ctrl. + */ + + /* Set bit 25 of display controller: Select CRT or VGA clock */ + ctrl &= ~INSPUR_CRT_DISP_CTL_CRTSELECT_MASK; + ctrl &= ~INSPUR_CRT_DISP_CTL_CLOCK_PHASE_MASK; + + ctrl |= INSPUR_CRT_DISP_CTL_CRTSELECT(INSPUR_CRTSELECT_CRT); + + /* clock_phase_polarity is 0 */ + ctrl |= INSPUR_CRT_DISP_CTL_CLOCK_PHASE(0); + + writel(ctrl, priv->mmio + INSPUR_CRT_DISP_CTL); + + return ctrl; +} + +static void inspur_crtc_mode_set_nofb(struct drm_crtc *crtc) +{ + unsigned int val; + struct drm_display_mode *mode = &crtc->state->mode; + struct drm_device *dev = crtc->dev; + struct inspur_drm_private *priv = dev->dev_private; + int width = mode->hsync_end - mode->hsync_start; + int height = mode->vsync_end - mode->vsync_start; + + //writel(format_pll_reg(), priv->mmio + INSPUR_CRT_PLL_CTRL); + writel(INSPUR_FIELD(INSPUR_CRT_HORZ_TOTAL_TOTAL, mode->htotal - 1) | + INSPUR_FIELD(INSPUR_CRT_HORZ_TOTAL_DISP_END, mode->hdisplay - 1), + priv->mmio + INSPUR_CRT_HORZ_TOTAL); + + writel(INSPUR_FIELD(INSPUR_CRT_HORZ_SYNC_WIDTH, width) | + INSPUR_FIELD(INSPUR_CRT_HORZ_SYNC_START, mode->hsync_start - 1), + priv->mmio + INSPUR_CRT_HORZ_SYNC); + + writel(INSPUR_FIELD(INSPUR_CRT_VERT_TOTAL_TOTAL, mode->vtotal - 1) | + INSPUR_FIELD(INSPUR_CRT_VERT_TOTAL_DISP_END, mode->vdisplay - 1), + priv->mmio + INSPUR_CRT_VERT_TOTAL); + + writel(INSPUR_FIELD(INSPUR_CRT_VERT_SYNC_HEIGHT, height) | + INSPUR_FIELD(INSPUR_CRT_VERT_SYNC_START, mode->vsync_start - 1), + priv->mmio + INSPUR_CRT_VERT_SYNC); + + val = INSPUR_FIELD(INSPUR_CRT_DISP_CTL_VSYNC_PHASE, 0); + val |= INSPUR_FIELD(INSPUR_CRT_DISP_CTL_HSYNC_PHASE, 0); + val |= INSPUR_CRT_DISP_CTL_TIMING(1); + val |= INSPUR_CRT_DISP_CTL_PLANE(1); + + display_ctrl_adjust(dev, mode, val); +} + +static void inspur_crtc_atomic_begin(struct drm_crtc *crtc, +#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 11, 0) + struct drm_atomic_state *state) +#else + struct drm_crtc_state *old_state) +#endif +{ + unsigned int reg; + struct drm_device *dev = crtc->dev; + struct inspur_drm_private *priv = dev->dev_private; + + inspur_set_power_mode(priv, INSPUR_PW_MODE_CTL_MODE_MODE0); + + /* Enable display power gate & LOCALMEM power gate*/ + reg = readl(priv->mmio + INSPUR_CURRENT_GATE); + reg &= ~INSPUR_CURR_GATE_DISPLAY_MASK; + reg &= ~INSPUR_CURR_GATE_LOCALMEM_MASK; + reg |= INSPUR_CURR_GATE_DISPLAY(1); + reg |= INSPUR_CURR_GATE_LOCALMEM(1); + inspur_set_current_gate(priv, reg); + + /* We can add more initialization as needed. */ +} + +static void inspur_crtc_atomic_flush(struct drm_crtc *crtc, +#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 11, 0) + struct drm_atomic_state *state) +#else + struct drm_crtc_state *old_state) +#endif +{ + unsigned long flags; + + spin_lock_irqsave(&crtc->dev->event_lock, flags); + if (crtc->state->event) + drm_crtc_send_vblank_event(crtc, crtc->state->event); + crtc->state->event = NULL; + spin_unlock_irqrestore(&crtc->dev->event_lock, flags); +} + +static int inspur_crtc_enable_vblank(struct drm_crtc *crtc) +{ + struct inspur_drm_private *priv = crtc->dev->dev_private; + + writel(INSPUR_RAW_INTERRUPT_EN_VBLANK(1), + priv->mmio + INSPUR_RAW_INTERRUPT_EN); + + return 0; +} + +static void inspur_crtc_disable_vblank(struct drm_crtc *crtc) +{ + struct inspur_drm_private *priv = crtc->dev->dev_private; + + writel(INSPUR_RAW_INTERRUPT_EN_VBLANK(0), + priv->mmio + INSPUR_RAW_INTERRUPT_EN); +} + +static const struct drm_crtc_funcs inspur_crtc_funcs = { + .page_flip = drm_atomic_helper_page_flip, + .set_config = drm_atomic_helper_set_config, + .destroy = drm_crtc_cleanup, + .reset = drm_atomic_helper_crtc_reset, + .atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state, + .atomic_destroy_state = drm_atomic_helper_crtc_destroy_state, + .enable_vblank = inspur_crtc_enable_vblank, + .disable_vblank = inspur_crtc_disable_vblank, + +}; + +static const struct drm_crtc_helper_funcs inspur_crtc_helper_funcs = { + .mode_set_nofb = inspur_crtc_mode_set_nofb, + .atomic_begin = inspur_crtc_atomic_begin, + .atomic_flush = inspur_crtc_atomic_flush, + .atomic_enable = inspur_crtc_atomic_enable, + .atomic_disable = inspur_crtc_atomic_disable, + .mode_valid = inspur_crtc_mode_valid, +}; + +int inspur_de_init(struct inspur_drm_private *priv) +{ + struct drm_device *dev = priv->dev; + struct drm_crtc *crtc; + struct drm_plane *plane; + int ret; + + plane = inspur_plane_init(priv); + if (IS_ERR(plane)) { + DRM_ERROR("failed to create plane: %ld\n", PTR_ERR(plane)); + return PTR_ERR(plane); + } + + crtc = devm_kzalloc(dev->dev, sizeof(*crtc), GFP_KERNEL); + if (!crtc) { + DRM_ERROR("failed to alloc memory when init crtc\n"); + return -ENOMEM; + } + + ret = drm_crtc_init_with_planes(dev, crtc, plane, + NULL, &inspur_crtc_funcs, NULL); + if (ret) { + DRM_ERROR("failed to init crtc: %d\n", ret); + return ret; + } + + ret = drm_mode_crtc_set_gamma_size(crtc, 256); + if (ret) { + DRM_ERROR("failed to set gamma size: %d\n", ret); + return ret; + } + drm_crtc_helper_add(crtc, &inspur_crtc_helper_funcs); + + return 0; +} diff --git a/drivers/gpu/drm/inspur/inspur_drm_drv.c b/drivers/gpu/drm/inspur/inspur_drm_drv.c new file mode 100644 index 000000000000..d7026e1df167 --- /dev/null +++ b/drivers/gpu/drm/inspur/inspur_drm_drv.c @@ -0,0 +1,456 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* INSPUR SoC drm driver + * + * Based on the smi drm driver. + * + * Copyright (c) 2020 SMI Limited. + * + * Author: + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include <linux/console.h> +#include <linux/module.h> + +#include <drm/drm_atomic_helper.h> +#include <drm/drm_crtc_helper.h> +#include <drm/drm_probe_helper.h> + +#include "inspur_drm_drv.h" +#include "inspur_drm_regs.h" + +#define MEM_SIZE_RESERVE4KVM 0x200000 + + +DEFINE_DRM_GEM_FOPS(inspur_fops); +irqreturn_t inspur_drm_interrupt(int irq, void *arg) +{ + struct drm_device *dev = (struct drm_device *)arg; + struct inspur_drm_private *priv = + (struct inspur_drm_private *)dev->dev_private; + u32 status; + + status = readl(priv->mmio + INSPUR_RAW_INTERRUPT); + + if (status & INSPUR_RAW_INTERRUPT_VBLANK(1)) { + writel(INSPUR_RAW_INTERRUPT_VBLANK(1), + priv->mmio + INSPUR_RAW_INTERRUPT); + drm_handle_vblank(dev, 0); + } + + return IRQ_HANDLED; +} + + + +static struct drm_driver inspur_driver = { + .driver_features = DRIVER_GEM | DRIVER_MODESET | + DRIVER_ATOMIC | DRIVER_HAVE_IRQ, + + .fops = &inspur_fops, + .name = "inspur", + .date = "20230425", + .desc = "inspur drm driver", + .major = 2, + .minor = 2, + //.gem_free_object_unlocked = inspur_gem_free_object, + .dumb_create = inspur_dumb_create, +#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 14, 0) + .dumb_map_offset = drm_gem_ttm_dumb_map_offset, +#else + .dumb_map_offset = drm_gem_vram_driver_dumb_mmap_offset, +#endif +}; + +static void inspur_remove_framebuffers(struct pci_dev *pdev) +{ + struct apertures_struct *ap; + + ap = alloc_apertures(1); + if (!ap) + return; + + ap->ranges[0].base = pci_resource_start(pdev, 0); + ap->ranges[0].size = pci_resource_len(pdev, 0); + +#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 15, 0) + drm_aperture_remove_conflicting_pci_framebuffers(pdev, &inspur_driver); +#elif LINUX_VERSION_CODE >= KERNEL_VERSION(5, 14, 0) + drm_aperture_remove_conflicting_pci_framebuffers(pdev, "inspurdrmfb"); +#else + drm_fb_helper_remove_conflicting_pci_framebuffers(pdev, "inspurdrmfb"); +#endif + + kfree(ap); +} + +static int __maybe_unused inspur_pm_suspend(struct device *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct drm_device *drm_dev = pci_get_drvdata(pdev); + struct inspur_drm_private *priv = drm_dev->dev_private; + + drm_kms_helper_poll_disable(drm_dev); + priv->suspend_state = drm_atomic_helper_suspend(drm_dev); + if (IS_ERR(priv->suspend_state)) { + DRM_ERROR("drm_atomic_helper_suspend failed: %ld\n", + PTR_ERR(priv->suspend_state)); + drm_kms_helper_poll_enable(drm_dev); + return PTR_ERR(priv->suspend_state); + } + + return 0; +} + +static int __maybe_unused inspur_pm_resume(struct device *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct drm_device *drm_dev = pci_get_drvdata(pdev); + struct inspur_drm_private *priv = drm_dev->dev_private; + + drm_atomic_helper_resume(drm_dev, priv->suspend_state); + drm_kms_helper_poll_enable(drm_dev); + + return 0; +} + +static const struct dev_pm_ops inspur_pm_ops = { + SET_SYSTEM_SLEEP_PM_OPS(inspur_pm_suspend, + inspur_pm_resume) +}; + +static int inspur_kms_init(struct inspur_drm_private *priv) +{ + int ret; + + drm_mode_config_init(priv->dev); + priv->mode_config_initialized = true; + + priv->dev->mode_config.min_width = 0; + priv->dev->mode_config.min_height = 0; + priv->dev->mode_config.max_width = 1920; + priv->dev->mode_config.max_height = 1200; + + priv->dev->mode_config.fb_base = priv->fb_base; + priv->dev->mode_config.preferred_depth = 32; + priv->dev->mode_config.prefer_shadow = 1; + + if (getKVMHWCursorSetting(priv)) { + priv->dev->mode_config.cursor_width = 64; + priv->dev->mode_config.cursor_height = 64; + } + + priv->dev->mode_config.funcs = (void *)&inspur_mode_funcs; + + ret = inspur_de_init(priv); + if (ret) { + DRM_ERROR("failed to init de: %d\n", ret); + return ret; + } + + ret = inspur_vdac_init(priv); + if (ret) { + DRM_ERROR("failed to init vdac: %d\n", ret); + return ret; + } + + return 0; +} + +static void inspur_kms_fini(struct inspur_drm_private *priv) +{ + if (priv->mode_config_initialized) { + drm_mode_config_cleanup(priv->dev); + priv->mode_config_initialized = false; + } +} + +/* + * It can operate in one of three modes: 0, 1 or Sleep. + */ +void inspur_set_power_mode(struct inspur_drm_private *priv, + unsigned int power_mode) +{ + unsigned int control_value = 0; + void __iomem *mmio = priv->mmio; + unsigned int input = 1; + + if (power_mode > INSPUR_PW_MODE_CTL_MODE_SLEEP) + return; + + if (power_mode == INSPUR_PW_MODE_CTL_MODE_SLEEP) + input = 0; + + control_value = readl(mmio + INSPUR_POWER_MODE_CTRL); + control_value &= ~(INSPUR_PW_MODE_CTL_MODE_MASK | + INSPUR_PW_MODE_CTL_OSC_INPUT_MASK); + control_value |= INSPUR_FIELD(INSPUR_PW_MODE_CTL_MODE, power_mode); + control_value |= INSPUR_FIELD(INSPUR_PW_MODE_CTL_OSC_INPUT, input); + writel(control_value, mmio + INSPUR_POWER_MODE_CTRL); +} + +void inspur_set_current_gate(struct inspur_drm_private *priv, unsigned int gate) +{ + unsigned int gate_reg; + unsigned int mode; + void __iomem *mmio = priv->mmio; + + /* Get current power mode. */ + mode = (readl(mmio + INSPUR_POWER_MODE_CTRL) & + INSPUR_PW_MODE_CTL_MODE_MASK) >> INSPUR_PW_MODE_CTL_MODE_SHIFT; + + switch (mode) { + case INSPUR_PW_MODE_CTL_MODE_MODE0: + gate_reg = INSPUR_MODE0_GATE; + break; + + case INSPUR_PW_MODE_CTL_MODE_MODE1: + gate_reg = INSPUR_MODE1_GATE; + break; + + default: + gate_reg = INSPUR_MODE0_GATE; + break; + } + writel(gate, mmio + gate_reg); +} + +static void inspur_hw_config(struct inspur_drm_private *priv) +{ + unsigned int reg; + + /* On hardware reset, power mode 0 is default. */ + inspur_set_power_mode(priv, INSPUR_PW_MODE_CTL_MODE_MODE0); + + /* Enable display power gate & LOCALMEM power gate*/ + reg = readl(priv->mmio + INSPUR_CURRENT_GATE); + reg &= ~INSPUR_CURR_GATE_DISPLAY_MASK; + reg &= ~INSPUR_CURR_GATE_LOCALMEM_MASK; + reg |= INSPUR_CURR_GATE_DISPLAY(1); + reg |= INSPUR_CURR_GATE_LOCALMEM(1); + + inspur_set_current_gate(priv, reg); + + /* + * Reset the memory controller. If the memory controller + * is not reset in chip,the system might hang when sw accesses + * the memory.The memory should be resetted after + * changing the MXCLK. + */ + reg = readl(priv->mmio + INSPUR_MISC_CTRL); + reg &= ~INSPUR_MSCCTL_LOCALMEM_RESET_MASK; + reg |= INSPUR_MSCCTL_LOCALMEM_RESET(0); + writel(reg, priv->mmio + INSPUR_MISC_CTRL); + + reg &= ~INSPUR_MSCCTL_LOCALMEM_RESET_MASK; + reg |= INSPUR_MSCCTL_LOCALMEM_RESET(1); + + writel(reg, priv->mmio + INSPUR_MISC_CTRL); +} + +static int inspur_hw_map(struct inspur_drm_private *priv) +{ + struct drm_device *dev = priv->dev; + struct pci_dev *pdev = to_pci_dev(dev->dev); + resource_size_t addr, size, ioaddr, iosize; + + ioaddr = pci_resource_start(pdev, 1); + iosize = pci_resource_len(pdev, 1); + priv->mmio = devm_ioremap(dev->dev, ioaddr, iosize); + if (!priv->mmio) { + DRM_ERROR("Cannot map mmio region\n"); + return -ENOMEM; + } + + addr = pci_resource_start(pdev, 0); + size = pci_resource_len(pdev, 0); + priv->fb_map = devm_ioremap(dev->dev, addr, size); + if (!priv->fb_map) { + DRM_ERROR("Cannot map framebuffer\n"); + return -ENOMEM; + } + priv->fb_base = addr; + priv->fb_size = size - MEM_SIZE_RESERVE4KVM; + + return 0; +} + +static void inspur_hw_unmap(struct inspur_drm_private *priv) +{ + struct drm_device *dev = priv->dev; + + if (priv->mmio) { + devm_iounmap(dev->dev, priv->mmio); + priv->mmio = NULL; + } + if (priv->fb_map) { + devm_iounmap(dev->dev, priv->fb_map); + priv->fb_map = NULL; + } +} + +static int inspur_hw_init(struct inspur_drm_private *priv) +{ + int ret; + + ret = inspur_hw_map(priv); + if (ret) + return ret; + + inspur_hw_config(priv); + + return 0; +} + +void inspur_unload(struct drm_device *dev) +{ + struct inspur_drm_private *priv = dev->dev_private; + struct pci_dev *pdev = to_pci_dev(dev->dev); + + drm_atomic_helper_shutdown(dev); + + free_irq(pdev->irq, dev); + + inspur_kms_fini(priv); + inspur_hw_unmap(priv); + pci_disable_msi(to_pci_dev(dev->dev)); + dev->dev_private = NULL; +} + +int inspur_load(struct drm_device *dev, unsigned long flags) +{ + struct inspur_drm_private *priv; + struct pci_dev *pdev = to_pci_dev(dev->dev); + int ret; + + priv = devm_kzalloc(dev->dev, sizeof(*priv), GFP_KERNEL); + if (!priv) { + DRM_ERROR("no memory to allocate for inspur_drm_private\n"); + return -ENOMEM; + } + dev->dev_private = priv; + priv->dev = dev; + + ret = inspur_hw_init(priv); + if (ret) + goto err; + + ret = drmm_vram_helper_init(dev, pci_resource_start(pdev, 0), priv->fb_size); + if (ret) { + drm_err(dev, "Error initializing VRAM MM; %d\n", ret); + goto err; + } + ret = inspur_kms_init(priv); + if (ret) + goto err; + + + /* reset all the states of crtc/plane/encoder/connector */ + drm_mode_config_reset(dev); + + if (getKVMHWCursorSetting(priv)) { +#if 0 + inspur_bo_create(dev, PAGE_ALIGN(1024), 0, 0, &priv->cursor.cursor_1); + inspur_bo_create(dev, PAGE_ALIGN(1024), 0, 0, &priv->cursor.cursor_2); + if (!priv->cursor.cursor_1 || !priv->cursor.cursor_2) { + priv->cursor.cursor_1 = NULL; + priv->cursor.cursor_2 = NULL; + DRM_ERROR("Could not allocate space for cursors. Not doing hardware cursors.\n"); + } +#endif + } + + return 0; + +err: + inspur_unload(dev); + DRM_ERROR("failed to initialize drm driver: %d\n", ret); + return ret; +} + +static int inspur_pci_probe(struct pci_dev *pdev, + const struct pci_device_id *ent) +{ + int ret = 0; + struct inspur_drm_private *priv; + struct drm_device *dev; + + inspur_remove_framebuffers(pdev); + + dev = drm_dev_alloc(&inspur_driver, &pdev->dev); + if (IS_ERR(dev)) { + DRM_ERROR("failed to allocate drm_device\n"); + return PTR_ERR(dev); + } + + pci_set_drvdata(pdev, dev); + ret = pci_enable_device(pdev); + if (ret) { + drm_err(dev, "failed to enable pci device: %d\n", ret); + return ret; + } + ret = inspur_load(dev, ent->driver_data); + if (ret) + goto err_return; + + ret = drm_dev_register(dev, ent->driver_data); + if (ret) + goto err_inspur_driver_unload; + + drm_fbdev_generic_setup(dev, dev->mode_config.preferred_depth); + + return 0; +err_inspur_driver_unload: + inspur_unload(dev); +err_return: + return ret; +} + +static void inspur_pci_remove(struct pci_dev *pdev) +{ + struct drm_device *dev = pci_get_drvdata(pdev); + + drm_put_dev(dev); + pci_disable_device(pdev); +} + +static void inspur_pci_shutdown(struct pci_dev *pdev) +{ + inspur_pci_remove(pdev); +} + +static struct pci_device_id inspur_pci_table[] = { + {0x1bd4, 0x0750, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, + {0,} +}; + +static struct pci_driver inspur_pci_driver = { + .name = "inspur-drm", + .id_table = inspur_pci_table, + .probe = inspur_pci_probe, + .remove = inspur_pci_remove, + .shutdown = inspur_pci_shutdown, + .driver.pm = &inspur_pm_ops, +}; + +static int __init inspur_init(void) +{ + return pci_register_driver(&inspur_pci_driver); +} + +static void __exit inspur_exit(void) +{ + return pci_unregister_driver(&inspur_pci_driver); +} + +module_init(inspur_init); +module_exit(inspur_exit); + +MODULE_DEVICE_TABLE(pci, inspur_pci_table); +MODULE_AUTHOR(""); +MODULE_DESCRIPTION("DRM Driver for INSPUR"); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/gpu/drm/inspur/inspur_drm_drv.h b/drivers/gpu/drm/inspur/inspur_drm_drv.h new file mode 100644 index 000000000000..b1a20f1b7df2 --- /dev/null +++ b/drivers/gpu/drm/inspur/inspur_drm_drv.h @@ -0,0 +1,116 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* INSPUR SoC drm driver + * + * Based on the smi drm driver. + * + * Copyright (c) 2020 SMI Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#ifndef INSPUR_DRM_DRV_H +#define INSPUR_DRM_DRV_H + +#include <linux/version.h> +#include <drm/drm_atomic.h> +#include <drm/drm_fb_helper.h> +#include <drm/drm_gem.h> +#include <drm/drm_gem_vram_helper.h> +#include <linux/pci.h> +#include <drm/drm_vblank.h> +#include <drm/drm_drv.h> + +#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 14, 0) +#include <drm/drm_aperture.h> +#endif + +#include <linux/delay.h> +#include <drm/drm_gem_framebuffer_helper.h> +struct drm_device; +struct drm_gem_object; + +#define inspur_framebuffer drm_framebuffer +#define BPP16_RED 0x0000f800 +#define BPP16_GREEN 0x000007e0 +#define BPP16_BLUE 0x0000001f +#define BPP16_WHITE 0x0000ffff +#define BPP16_GRAY 0x00008410 +#define BPP16_YELLOW 0x0000ffe0 +#define BPP16_CYAN 0x000007ff +#define BPP16_PINK 0x0000f81f +#define BPP16_BLACK 0x00000000 +struct inspur_fbdev { + struct drm_fb_helper helper; + struct inspur_framebuffer *fb; + int size; +}; + +struct inspur_cursor { + struct drm_gem_vram_object *gbo[2]; + unsigned int next_index; +}; + +struct inspur_drm_private { + /* hw */ + void __iomem *mmio; + void __iomem *fb_map; + unsigned long fb_base; + unsigned long fb_size; + + /* drm */ + struct drm_device *dev; + bool mode_config_initialized; + struct drm_atomic_state *suspend_state; + + /* fbdev */ + struct inspur_fbdev *fbdev; + + /* hw cursor */ + struct inspur_cursor cursor; +}; + +#define to_inspur_framebuffer(x) container_of(x, struct inspur_framebuffer, fb) + + +void inspur_set_power_mode(struct inspur_drm_private *priv, + unsigned int power_mode); +void inspur_set_current_gate(struct inspur_drm_private *priv, + unsigned int gate); +int inspur_load(struct drm_device *dev, unsigned long flags); +void inspur_unload(struct drm_device *dev); + +int inspur_de_init(struct inspur_drm_private *priv); +int inspur_vdac_init(struct inspur_drm_private *priv); +int inspur_fbdev_init(struct inspur_drm_private *priv); +void inspur_fbdev_fini(struct inspur_drm_private *priv); + +int inspur_gem_create(struct drm_device *dev, u32 size, bool iskernel, struct drm_gem_object **obj); +struct inspur_framebuffer * +inspur_framebuffer_init(struct drm_device *dev, + const struct drm_mode_fb_cmd2 *mode_cmd, + struct drm_gem_object *obj); + +int inspur_mm_init(struct inspur_drm_private *inspur); +void inspur_mm_fini(struct inspur_drm_private *inspur); +int inspur_dumb_create(struct drm_file *file, struct drm_device *dev, + struct drm_mode_create_dumb *args); + +extern const struct drm_mode_config_funcs inspur_mode_funcs; + +/* inspur_drm_cursor.c */ +int inspur_cursor_init(struct inspur_drm_private *priv); +void inspur_cursor_fini(struct inspur_drm_private *priv); +int inspur_crtc_cursor_set(struct drm_crtc *crtc, + struct drm_file *file_priv, + uint32_t handle, uint32_t width, + uint32_t height); +int inspur_crtc_cursor_move(struct drm_crtc *crtc, int x, int y); +unsigned char getKVMHWCursorSetting(struct inspur_drm_private *priv); +void colorcur2monocur(void *data, void *out); + + +#endif diff --git a/drivers/gpu/drm/inspur/inspur_drm_regs.h b/drivers/gpu/drm/inspur/inspur_drm_regs.h new file mode 100644 index 000000000000..a28dfd1285d7 --- /dev/null +++ b/drivers/gpu/drm/inspur/inspur_drm_regs.h @@ -0,0 +1,223 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* INSPUR SoC drm driver + * + * Based on the smi drm driver. + * + * Copyright (c) 2020 SMI Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#ifndef INSPUR_DRM_HW_H +#define INSPUR_DRM_HW_H + +/* register definition */ +#define INSPUR_MISC_CTRL 0x4 + +#define INSPUR_MSCCTL_LOCALMEM_RESET(x) ((x) << 6) +#define INSPUR_MSCCTL_LOCALMEM_RESET_MASK 0x40 + +#define INSPUR_CURRENT_GATE 0x000040 +#define INSPUR_CURR_GATE_DISPLAY(x) ((x) << 2) +#define INSPUR_CURR_GATE_DISPLAY_MASK 0x4 + +#define INSPUR_CURR_GATE_LOCALMEM(x) ((x) << 1) +#define INSPUR_CURR_GATE_LOCALMEM_MASK 0x2 + +#define INSPUR_MODE0_GATE 0x000044 +#define INSPUR_MODE1_GATE 0x000048 +#define INSPUR_POWER_MODE_CTRL 0x00004C + +#define INSPUR_PW_MODE_CTL_OSC_INPUT(x) ((x) << 3) +#define INSPUR_PW_MODE_CTL_OSC_INPUT_MASK 0x8 + +#define INSPUR_PW_MODE_CTL_MODE(x) ((x) << 0) +#define INSPUR_PW_MODE_CTL_MODE_MASK 0x03 +#define INSPUR_PW_MODE_CTL_MODE_SHIFT 0 + +#define INSPUR_PW_MODE_CTL_MODE_MODE0 0 +#define INSPUR_PW_MODE_CTL_MODE_MODE1 1 +#define INSPUR_PW_MODE_CTL_MODE_SLEEP 2 + +//#define INSPUR_CRT_PLL_CTRL 0x000060 + +#define INSPUR_PLL_CTRL_BYPASS(x) ((x) << 18) +#define INSPUR_PLL_CTRL_BYPASS_MASK 0x40000 + +#define INSPUR_PLL_CTRL_POWER(x) ((x) << 17) +#define INSPUR_PLL_CTRL_POWER_MASK 0x20000 + +#define INSPUR_PLL_CTRL_INPUT(x) ((x) << 16) +#define INSPUR_PLL_CTRL_INPUT_MASK 0x10000 + +#define INSPUR_PLL_CTRL_POD(x) ((x) << 14) +#define INSPUR_PLL_CTRL_POD_MASK 0xC000 + +#define INSPUR_PLL_CTRL_OD(x) ((x) << 12) +#define INSPUR_PLL_CTRL_OD_MASK 0x3000 + +#define INSPUR_PLL_CTRL_N(x) ((x) << 8) +#define INSPUR_PLL_CTRL_N_MASK 0xF00 + +#define INSPUR_PLL_CTRL_M(x) ((x) << 0) +#define INSPUR_PLL_CTRL_M_MASK 0xFF + +#define INSPUR_CRT_DISP_CTL 0x80200 + + +#define INSPUR_CRT_DISP_CTL_DPMS(x) ((x) << 30) +#define INSPUR_CRT_DISP_CTL_DPMS_MASK 0xc0000000 + +#define INSPUR_CRT_DPMS_ON 0 +#define INSPUR_CRT_DPMS_OFF 3 + + +#define INSPUR_CRT_DISP_CTL_CRTSELECT(x) ((x) << 25) +#define INSPUR_CRT_DISP_CTL_CRTSELECT_MASK 0x2000000 + +#define INSPUR_CRTSELECT_CRT 1 + +#define INSPUR_CRT_DISP_CTL_CLOCK_PHASE(x) ((x) << 14) +#define INSPUR_CRT_DISP_CTL_CLOCK_PHASE_MASK 0x4000 + +#define INSPUR_CRT_DISP_CTL_VSYNC_PHASE(x) ((x) << 13) +#define INSPUR_CRT_DISP_CTL_VSYNC_PHASE_MASK 0x2000 + +#define INSPUR_CRT_DISP_CTL_HSYNC_PHASE(x) ((x) << 12) +#define INSPUR_CRT_DISP_CTL_HSYNC_PHASE_MASK 0x1000 + +#define INSPUR_CRT_DISP_CTL_TIMING(x) ((x) << 8) +#define INSPUR_CRT_DISP_CTL_TIMING_MASK 0x100 + +#define INSPUR_CRT_DISP_CTL_PLANE(x) ((x) << 2) +#define INSPUR_CRT_DISP_CTL_PLANE_MASK 4 + +#define INSPUR_CRT_DISP_CTL_FORMAT(x) ((x) << 0) +#define INSPUR_CRT_DISP_CTL_FORMAT_MASK 0x03 + +#define INSPUR_CRT_FB_ADDRESS 0x080204 + +#define INSPUR_CRT_FB_WIDTH 0x080208 +#define INSPUR_CRT_FB_WIDTH_WIDTH(x) ((x) << 16) +#define INSPUR_CRT_FB_WIDTH_WIDTH_MASK 0x3FFF0000 +#define INSPUR_CRT_FB_WIDTH_OFFS(x) ((x) << 0) +#define INSPUR_CRT_FB_WIDTH_OFFS_MASK 0x3FFF + +#define INSPUR_CRT_HORZ_TOTAL 0x08020C +#define INSPUR_CRT_HORZ_TOTAL_TOTAL(x) ((x) << 16) +#define INSPUR_CRT_HORZ_TOTAL_TOTAL_MASK 0xFFF0000 + +#define INSPUR_CRT_HORZ_TOTAL_DISP_END(x) ((x) << 0) +#define INSPUR_CRT_HORZ_TOTAL_DISP_END_MASK 0xFFF + +#define INSPUR_CRT_HORZ_SYNC 0x080210 +#define INSPUR_CRT_HORZ_SYNC_WIDTH(x) ((x) << 16) +#define INSPUR_CRT_HORZ_SYNC_WIDTH_MASK 0xFF0000 + +#define INSPUR_CRT_HORZ_SYNC_START(x) ((x) << 0) +#define INSPUR_CRT_HORZ_SYNC_START_MASK 0xFFF + +#define INSPUR_CRT_VERT_TOTAL 0x080214 +#define INSPUR_CRT_VERT_TOTAL_TOTAL(x) ((x) << 16) +#define INSPUR_CRT_VERT_TOTAL_TOTAL_MASK 0x7FFF0000 + +#define INSPUR_CRT_VERT_TOTAL_DISP_END(x) ((x) << 0) +#define INSPUR_CRT_VERT_TOTAL_DISP_END_MASK 0x7FF + +#define INSPUR_CRT_VERT_SYNC 0x080218 +#define INSPUR_CRT_VERT_SYNC_HEIGHT(x) ((x) << 16) +#define INSPUR_CRT_VERT_SYNC_HEIGHT_MASK 0x3F0000 + +#define INSPUR_CRT_VERT_SYNC_START(x) ((x) << 0) +#define INSPUR_CRT_VERT_SYNC_START_MASK 0x7FF + +/* Hardware Cursor */ +#define INSPUR_HWC_ADDRESS 0x080230 +#define INSPUR_HWC_ADDRESS_ENABLE(x) ((x) << 31) +#define INSPUR_HWC_ADDRESS_ENABLE_MASK 0x80000000 +#define INSPUR_HWC_ADDRESS_ADDRESS(x) ((x) << 0) +#define INSPUR_HWC_ADDRESS_ADDRESS_MASK 0xFFFFFFF + +#define INSPUR_HWC_LOCATION 0x080234 +#define INSPUR_HWC_LOCATION_TOP(x) ((x) << 27) +#define INSPUR_HWC_LOCATION_TOP_MASK 0x8000000 +#define INSPUR_HWC_LOCATION_Y(x) ((x) << 16) +#define INSPUR_HWC_LOCATION_Y_MASK 0x7FF0000 +#define INSPUR_HWC_LOCATION_LEFT(x) ((x) << 11) +#define INSPUR_HWC_LOCATION_LEFT_MASK 0x800 +#define INSPUR_HWC_LOCATION_X(x) ((x) << 0) +#define INSPUR_HWC_LOCATION_X_MASK 0x7FF + +#define INSPUR_HWC_COLOR_12 0x080238 +#define INSPUR_HWC_COLOR_12_2_RGB(x) ((x) << 16) +#define INSPUR_HWC_COLOR_12_2_RGB_MASK 0xFFFF0000 +#define INSPUR_HWC_COLOR_12_1_RGB(x) ((x) << 0) +#define INSPUR_HWC_COLOR_12_1_RGB_MASK 0xFFFF + +#define INSPUR_HWC_COLOR_3 0x08023C +#define INSPUR_HWC_COLOR_3_RGB(x) ((x) << 0) +#define INSPUR_HWC_COLOR_3_RGB_MASK 0xFFFF + +/* Auto Centering */ +#define INSPUR_CRT_AUTO_CENTERING_TL 0x080280 +#define INSPUR_CRT_AUTO_CENTERING_TL_TOP(x) ((x) << 16) +#define INSPUR_CRT_AUTO_CENTERING_TL_TOP_MASK 0x7FF0000 + +#define INSPUR_CRT_AUTO_CENTERING_TL_LEFT(x) ((x) << 0) +#define INSPUR_CRT_AUTO_CENTERING_TL_LEFT_MASK 0x7FF + +#define INSPUR_CRT_AUTO_CENTERING_BR 0x080284 +#define INSPUR_CRT_AUTO_CENTERING_BR_BOTTOM(x) ((x) << 16) +#define INSPUR_CRT_AUTO_CENTERING_BR_BOTTOM_MASK 0x7FF0000 + +#define INSPUR_CRT_AUTO_CENTERING_BR_RIGHT(x) ((x) << 0) +#define INSPUR_CRT_AUTO_CENTERING_BR_RIGHT_MASK 0x7FF + +/* register to control panel output */ +#define INSPUR_DISPLAY_CONTROL_HISILE 0x80288 +#define INSPUR_DISPLAY_CONTROL_FPVDDEN(x) ((x) << 0) +#define INSPUR_DISPLAY_CONTROL_PANELDATE(x) ((x) << 1) +#define INSPUR_DISPLAY_CONTROL_FPEN(x) ((x) << 2) +#define INSPUR_DISPLAY_CONTROL_VBIASEN(x) ((x) << 3) + +#define INSPUR_RAW_INTERRUPT 0x80290 +#define INSPUR_RAW_INTERRUPT_VBLANK(x) ((x) << 2) +#define INSPUR_RAW_INTERRUPT_VBLANK_MASK 0x4 + +#define INSPUR_RAW_INTERRUPT_EN 0x80298 +#define INSPUR_RAW_INTERRUPT_EN_VBLANK(x) ((x) << 2) +#define INSPUR_RAW_INTERRUPT_EN_VBLANK_MASK 0x4 + +/* register and values for PLL control */ +#define CRT_PLL1_NS 0x802a8 +#define CRT_PLL1_NS_OUTER_BYPASS(x) ((x) << 30) +#define CRT_PLL1_NS_INTER_BYPASS(x) ((x) << 29) +#define CRT_PLL1_NS_POWERON(x) ((x) << 24) + +#define CRT_PLL1_NS_25MHZ 0x00006691 //640x480 +#define CRT_PLL1_NS_40MHZ 0x00004580 //800x600 +#define CRT_PLL1_NS_65MHZ 0x00002568 //1024x768 +#define CRT_PLL1_NS_83MHZ 0x000027bb //1280x800 +#define CRT_PLL1_NS_106MHZ 0x000027ef //1440x900 +#define CRT_PLL1_NS_108MHZ 0x000027f2 //1280x1024 +#define CRT_PLL1_NS_146MHZ 0x00001575 //1680x1050 +#define CRT_PLL1_NS_148MHZ 0x0000145f //1920x1080 +#define CRT_PLL1_NS_193MHZ 0x000018f7 //1920x1200 + +#define CRT_PLL2_NS 0x802ac +#define CRT_PLL2_NS_25MHZ 0x0 +#define CRT_PLL2_NS_40MHZ 0x0 +#define CRT_PLL2_NS_65MHZ 0x0 +#define CRT_PLL2_NS_83MHZ 0x0 +#define CRT_PLL2_NS_106MHZ 0x0 +#define CRT_PLL2_NS_108MHZ 0x0 +#define CRT_PLL2_NS_146MHZ 0x0 +#define CRT_PLL2_NS_148MHZ 0x0 +#define CRT_PLL2_NS_193MHZ 0x0 + +#define INSPUR_FIELD(field, value) (field(value) & field##_MASK) +#endif diff --git a/drivers/gpu/drm/inspur/inspur_drm_vdac.c b/drivers/gpu/drm/inspur/inspur_drm_vdac.c new file mode 100644 index 000000000000..20e22ef02546 --- /dev/null +++ b/drivers/gpu/drm/inspur/inspur_drm_vdac.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* INSPUR SoC drm driver + * + * Based on the smi drm driver. + * + * Copyright (c) 2020 SMI Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include <drm/drm_atomic_helper.h> +#include <drm/drm_probe_helper.h> + +#include "inspur_drm_drv.h" +#include "inspur_drm_regs.h" + +static int inspur_connector_get_modes(struct drm_connector *connector) +{ + int count; + + count = drm_add_modes_noedid(connector, + connector->dev->mode_config.max_width, + connector->dev->mode_config.max_height); + drm_set_preferred_mode(connector, 1024, 768); + return count; +} + +static int inspur_connector_mode_valid(struct drm_connector *connector, + struct drm_display_mode *mode) +{ + return MODE_OK; +} + +static const struct drm_connector_helper_funcs + inspur_connector_helper_funcs = { + .get_modes = inspur_connector_get_modes, + .mode_valid = inspur_connector_mode_valid, +}; + +static const struct drm_connector_funcs inspur_connector_funcs = { + .fill_modes = drm_helper_probe_single_connector_modes, + .destroy = drm_connector_cleanup, + .reset = drm_atomic_helper_connector_reset, + .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state, + .atomic_destroy_state = drm_atomic_helper_connector_destroy_state, +}; + +static void inspur_encoder_mode_set(struct drm_encoder *encoder, + struct drm_display_mode *mode, + struct drm_display_mode *adj_mode) +{ + u32 reg; + struct drm_device *dev = encoder->dev; + struct inspur_drm_private *priv = dev->dev_private; + + reg = readl(priv->mmio + INSPUR_DISPLAY_CONTROL_HISILE); + reg |= INSPUR_DISPLAY_CONTROL_FPVDDEN(1); + reg |= INSPUR_DISPLAY_CONTROL_PANELDATE(1); + reg |= INSPUR_DISPLAY_CONTROL_FPEN(1); + reg |= INSPUR_DISPLAY_CONTROL_VBIASEN(1); + writel(reg, priv->mmio + INSPUR_DISPLAY_CONTROL_HISILE); +} + +static const struct drm_encoder_helper_funcs inspur_encoder_helper_funcs = { + .mode_set = inspur_encoder_mode_set, +}; + +static const struct drm_encoder_funcs inspur_encoder_funcs = { + .destroy = drm_encoder_cleanup, +}; + +int inspur_vdac_init(struct inspur_drm_private *priv) +{ + struct drm_device *dev = priv->dev; + struct drm_encoder *encoder; + struct drm_connector *connector; + int ret; + + encoder = devm_kzalloc(dev->dev, sizeof(*encoder), GFP_KERNEL); + if (!encoder) { + DRM_ERROR("failed to alloc memory when init encoder\n"); + return -ENOMEM; + } + + encoder->possible_crtcs = 0x1; + ret = drm_encoder_init(dev, encoder, &inspur_encoder_funcs, + DRM_MODE_ENCODER_DAC, NULL); + if (ret) { + DRM_ERROR("failed to init encoder: %d\n", ret); + return ret; + } + + drm_encoder_helper_add(encoder, &inspur_encoder_helper_funcs); + + connector = devm_kzalloc(dev->dev, sizeof(*connector), GFP_KERNEL); + if (!connector) { + DRM_ERROR("failed to alloc memory when init connector\n"); + return -ENOMEM; + } + + ret = drm_connector_init(dev, connector, + &inspur_connector_funcs, + DRM_MODE_CONNECTOR_VGA); + if (ret) { + DRM_ERROR("failed to init connector: %d\n", ret); + return ret; + } + drm_connector_helper_add(connector, &inspur_connector_helper_funcs); + + drm_connector_register(connector); + drm_connector_attach_encoder(connector, encoder); + return 0; +} diff --git a/drivers/gpu/drm/inspur/inspur_ttm.c b/drivers/gpu/drm/inspur/inspur_ttm.c new file mode 100644 index 000000000000..5757120597e9 --- /dev/null +++ b/drivers/gpu/drm/inspur/inspur_ttm.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* INSPUR SoC drm driver + * + * Based on the smi drm driver. + * + * Copyright (c) 2020 SMI Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include <drm/drm_atomic_helper.h> + +#include "inspur_drm_drv.h" + + +int inspur_dumb_create(struct drm_file *file, struct drm_device *dev, + struct drm_mode_create_dumb *args) +{ + + return drm_gem_vram_fill_create_dumb(file, dev, 0, 128, args); +} + + + + + +const struct drm_mode_config_funcs inspur_mode_funcs = { + .atomic_check = drm_atomic_helper_check, + .atomic_commit = drm_atomic_helper_commit, + .fb_create = drm_gem_fb_create, + .mode_valid = drm_vram_helper_mode_valid, +}; -- 2.33.0

2 1

[PATCH OLK-5.10] LoongArch: Fix module relocation error with binutils 2.41
by Hongchen Zhang 13 Sep '23

13 Sep '23

From: Huacai Chen <chenhuacai(a)loongson.cn> stable inclusion from stable-v6.5-rc4 commit 03c53eb90c0c61885b2175adf8675fb56df7f8db category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I80YEI CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… --------------------------- Binutils 2.41 enables linker relaxation by default, but the kernel module loader doesn't support that, so just disable it. Otherwise we get such an error when loading modules: "Unknown relocation type 102" As an alternative, we could add linker relaxation support in the kernel module loader. But it is relatively large complexity that may or may not bring a similar gain, and we don't really want to include this linker pass in the kernel. Reviewed-by: WANG Xuerui <git(a)xen0n.name> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> --- arch/loongarch/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile index 345dc10576d4..a0f194da592b 100644 --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -55,6 +55,8 @@ LDFLAGS_vmlinux += -G0 -static -n -nostdlib ifdef CONFIG_AS_HAS_EXPLICIT_RELOCS cflags-y += -mexplicit-relocs KBUILD_CFLAGS_KERNEL += -mdirect-extern-access +KBUILD_AFLAGS_MODULE += $(call cc-option,-mno-relax) $(call cc-option,-Wa$(comma)-mno-relax) +KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-relax) $(call cc-option,-Wa$(comma)-mno-relax) else cflags-y += $(call cc-option,-mno-explicit-relocs) KBUILD_AFLAGS_KERNEL += -Wa,-mla-global-with-pcrel -- 2.33.0

2 1

[PATCH OLK-5.10 1/2] LoongArch: Fix the write_fcsr() macro
by Hongchen Zhang 13 Sep '23

13 Sep '23

From: Qi Hu <huqi(a)loongson.cn> linux-next inclusion from next-20230616 commit 346dc929623cef70ff7832a4fa0ffd1b696e312a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I80YEI CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/… --------------------------- The "write_fcsr()" macro uses wrong the positions for val and dest in asm. Fix it! Reported-by: Miao HAO <haomiao19(a)mails.ucas.ac.cn> Signed-off-by: Qi Hu <huqi(a)loongson.cn> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> --- arch/loongarch/include/asm/loongarch.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h index 33a8fa446ba9..0b8c1bde008f 100644 --- a/arch/loongarch/include/asm/loongarch.h +++ b/arch/loongarch/include/asm/loongarch.h @@ -1521,7 +1521,7 @@ __BUILD_CSR_OP(tlbidx) #define write_fcsr(dest, val) \ do { \ __asm__ __volatile__( \ - " movgr2fcsr %0, "__stringify(dest)" \n" \ + " movgr2fcsr "__stringify(dest)", %0 \n" \ : : "r" (val)); \ } while (0) -- 2.33.0

2 2

[PATCH openEuler-1.0-LTS] netfilter: nftables: exthdr: fix 4-byte stack OOB write
by Zhengchao Shao 13 Sep '23

13 Sep '23

From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.6-rc1 commit fd94d9dadee58e09b49075240fe83423eb1dcd36 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I80I0G CVE: CVE-2023-4881 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- If priv->len is a multiple of 4, then dst[len / 4] can write past the destination array which leads to stack corruption. This construct is necessary to clean the remainder of the register in case ->len is NOT a multiple of the register size, so make it conditional just like nft_payload.c does. The bug was added in 4.1 cycle and then copied/inherited when tcp/sctp and ip option support was added. Bug reported by Zero Day Initiative project (ZDI-CAN-21950, ZDI-CAN-21951, ZDI-CAN-21961). Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing") Fixes: 935b7f643018 ("netfilter: nft_exthdr: add TCP option matching") Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks") Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options") Signed-off-by: Florian Westphal <fw(a)strlen.de> Conflicts: net/netfilter/nft_exthdr.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nft_exthdr.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c index 93fee4106019..07dd5a723d79 100644 --- a/net/netfilter/nft_exthdr.c +++ b/net/netfilter/nft_exthdr.c @@ -36,6 +36,14 @@ static unsigned int optlen(const u8 *opt, unsigned int offset) return opt[offset + 1]; } +static int nft_skb_copy_to_reg(const struct sk_buff *skb, int offset, u32 *dest, unsigned int len) +{ + if (len % NFT_REG32_SIZE) + dest[len / NFT_REG32_SIZE] = 0; + + return skb_copy_bits(skb, offset, dest, len); +} + static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) @@ -57,8 +65,7 @@ static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, } offset += priv->offset; - dest[priv->len / NFT_REG32_SIZE] = 0; - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) goto err; return; err: @@ -114,7 +121,8 @@ static void nft_exthdr_tcp_eval(const struct nft_expr *expr, if (priv->flags & NFT_EXTHDR_F_PRESENT) { *dest = 1; } else { - dest[priv->len / NFT_REG32_SIZE] = 0; + if (priv->len % NFT_REG32_SIZE) + dest[priv->len / NFT_REG32_SIZE] = 0; memcpy(dest, opt + offset, priv->len); } -- 2.34.1

2 1

[PATCH OLK-5.10] netfilter: nftables: exthdr: fix 4-byte stack OOB write
by Zhengchao Shao 13 Sep '23

13 Sep '23

From: Florian Westphal <fw(a)strlen.de> mainline inclusion from mainline-v6.6-rc1 commit fd94d9dadee58e09b49075240fe83423eb1dcd36 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I80I0G CVE: CVE-2023-4881 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- If priv->len is a multiple of 4, then dst[len / 4] can write past the destination array which leads to stack corruption. This construct is necessary to clean the remainder of the register in case ->len is NOT a multiple of the register size, so make it conditional just like nft_payload.c does. The bug was added in 4.1 cycle and then copied/inherited when tcp/sctp and ip option support was added. Bug reported by Zero Day Initiative project (ZDI-CAN-21950, ZDI-CAN-21951, ZDI-CAN-21961). Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing") Fixes: 935b7f643018 ("netfilter: nft_exthdr: add TCP option matching") Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks") Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options") Signed-off-by: Florian Westphal <fw(a)strlen.de> Conflicts: net/netfilter/nft_exthdr.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nft_exthdr.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c index 670dd146fb2b..ca268293cfa1 100644 --- a/net/netfilter/nft_exthdr.c +++ b/net/netfilter/nft_exthdr.c @@ -33,6 +33,14 @@ static unsigned int optlen(const u8 *opt, unsigned int offset) return opt[offset + 1]; } +static int nft_skb_copy_to_reg(const struct sk_buff *skb, int offset, u32 *dest, unsigned int len) +{ + if (len % NFT_REG32_SIZE) + dest[len / NFT_REG32_SIZE] = 0; + + return skb_copy_bits(skb, offset, dest, len); +} + static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) @@ -54,8 +62,7 @@ static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, } offset += priv->offset; - dest[priv->len / NFT_REG32_SIZE] = 0; - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) goto err; return; err: @@ -151,8 +158,7 @@ static void nft_exthdr_ipv4_eval(const struct nft_expr *expr, } offset += priv->offset; - dest[priv->len / NFT_REG32_SIZE] = 0; - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) goto err; return; err: @@ -208,7 +214,8 @@ static void nft_exthdr_tcp_eval(const struct nft_expr *expr, if (priv->flags & NFT_EXTHDR_F_PRESENT) { *dest = 1; } else { - dest[priv->len / NFT_REG32_SIZE] = 0; + if (priv->len % NFT_REG32_SIZE) + dest[priv->len / NFT_REG32_SIZE] = 0; memcpy(dest, opt + offset, priv->len); } -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS-SP1] io_uring: ensure IOPOLL locks around deferred work
by Zhihao Cheng 13 Sep '23

13 Sep '23

From: Jens Axboe <axboe(a)kernel.dk> stable inclusion from stable-v5.10.188 commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7KXLN CVE: CVE-2023-21400 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- No direct upstream commit exists for this issue. It was fixed in 5.18 as part of a larger rework of the completion side. io_commit_cqring() writes the CQ ring tail to make it visible, but it also kicks off any deferred work we have. A ring setup with IOPOLL does not need any locking around the CQ ring updates, as we're always under the ctx uring_lock. But if we have deferred work that needs processing, then io_queue_deferred() assumes that the completion_lock is held, as it is for !IOPOLL. Add a lockdep assertion to check and document this fact, and have io_iopoll_complete() check if we have deferred work and run that separately with the appropriate lock grabbed. Cc: stable(a)vger.kernel.org # 5.10, 5.15 Reported-by: dghost david <daviduniverse18(a)gmail.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Lin Yujun <linyujun809(a)huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> --- io_uring/io_uring.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3d35f5d13666..781af0b05d8c 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1521,6 +1521,8 @@ static void io_kill_timeout(struct io_kiocb *req, int status) static void io_queue_deferred(struct io_ring_ctx *ctx) { + lockdep_assert_held(&ctx->completion_lock); + while (!list_empty(&ctx->defer_list)) { struct io_defer_entry *de = list_first_entry(&ctx->defer_list, struct io_defer_entry, list); @@ -1572,14 +1574,24 @@ static void __io_commit_cqring_flush(struct io_ring_ctx *ctx) io_queue_deferred(ctx); } -static inline void io_commit_cqring(struct io_ring_ctx *ctx) +static inline bool io_commit_needs_flush(struct io_ring_ctx *ctx) +{ + return ctx->off_timeout_used || ctx->drain_active; +} + +static inline void __io_commit_cqring(struct io_ring_ctx *ctx) { - if (unlikely(ctx->off_timeout_used || ctx->drain_active)) - __io_commit_cqring_flush(ctx); /* order cqe stores with ring update */ smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail); } +static inline void io_commit_cqring(struct io_ring_ctx *ctx) +{ + if (unlikely(io_commit_needs_flush(ctx))) + __io_commit_cqring_flush(ctx); + __io_commit_cqring(ctx); +} + static inline bool io_sqring_full(struct io_ring_ctx *ctx) { struct io_rings *r = ctx->rings; @@ -2509,7 +2521,12 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, io_req_free_batch(&rb, req, &ctx->submit_state); } - io_commit_cqring(ctx); + if (io_commit_needs_flush(ctx)) { + spin_lock(&ctx->completion_lock); + __io_commit_cqring_flush(ctx); + spin_unlock(&ctx->completion_lock); + } + __io_commit_cqring(ctx); io_cqring_ev_posted_iopoll(ctx); io_req_free_batch_finish(ctx, &rb); } -- 2.31.1

2 1

[PATCH openEuler-1.0-LTS] ucc: add ucc support
by Jinjie Ruan 13 Sep '23

13 Sep '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I80YXE CVE: NA ---------------------------------------- ucc support for XPU. Signed-off-by: Chen Hui <judy.chenhui(a)huawei.com> Signed-off-by: Yang Yanchao <yangyanchao6(a)huawei.com> Signed-off-by: Hui Tang <tanghui20(a)huawei.com> Signed-off-by: Guan Jing <guanjing6(a)huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> --- Kconfig | 2 + drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/xpu/Kconfig | 9 + drivers/xpu/Makefile | 1 + drivers/xpu/xpu_group.c | 175 ++++++++ fs/proc/base.c | 102 ++++- include/linux/sched.h | 3 + include/linux/ucc_common.h | 21 + include/linux/ucc_kfd.h | 110 +++++ include/linux/ucc_sched.h | 36 ++ include/linux/ucc_sched/ucc_sched.h | 71 +++ include/linux/ucc_ts.h | 254 +++++++++++ include/linux/vstream.h | 123 ++++++ include/linux/xpu_group.h | 66 +++ include/trace/events/ucc_sched.h | 120 +++++ init/init_task.c | 4 + init/main.c | 9 + kernel/Makefile | 2 + kernel/sched/Makefile | 1 + kernel/sched/core.c | 5 + kernel/sched/ucc_sched.c | 148 +++++++ kernel/sysctl.c | 17 +- kernel/ucc/Kconfig | 21 + kernel/ucc/Makefile | 1 + kernel/ucc/ascend_vstream.c | 654 ++++++++++++++++++++++++++++ kernel/ucc/ascend_vstream.h | 13 + kernel/ucc/vstream.c | 62 +++ kernel/ucc_sched/Makefile | 1 + kernel/ucc_sched/core.c | 591 +++++++++++++++++++++++++ kernel/ucc_sched/ucc_sched.h | 43 ++ 31 files changed, 2666 insertions(+), 2 deletions(-) create mode 100644 drivers/xpu/Kconfig create mode 100644 drivers/xpu/Makefile create mode 100644 drivers/xpu/xpu_group.c create mode 100644 include/linux/ucc_common.h create mode 100644 include/linux/ucc_kfd.h create mode 100644 include/linux/ucc_sched.h create mode 100644 include/linux/ucc_sched/ucc_sched.h create mode 100644 include/linux/ucc_ts.h create mode 100644 include/linux/vstream.h create mode 100644 include/linux/xpu_group.h create mode 100644 include/trace/events/ucc_sched.h create mode 100644 kernel/sched/ucc_sched.c create mode 100644 kernel/ucc/Kconfig create mode 100644 kernel/ucc/Makefile create mode 100644 kernel/ucc/ascend_vstream.c create mode 100644 kernel/ucc/ascend_vstream.h create mode 100644 kernel/ucc/vstream.c create mode 100644 kernel/ucc_sched/Makefile create mode 100644 kernel/ucc_sched/core.c create mode 100644 kernel/ucc_sched/ucc_sched.h diff --git a/Kconfig b/Kconfig index 48a80beab685..8e558777fb54 100644 --- a/Kconfig +++ b/Kconfig @@ -30,3 +30,5 @@ source "crypto/Kconfig" source "lib/Kconfig" source "lib/Kconfig.debug" + +source "kernel/ucc/Kconfig" diff --git a/drivers/Kconfig b/drivers/Kconfig index ab4d43923c4d..bd59e9e525ba 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -219,4 +219,6 @@ source "drivers/siox/Kconfig" source "drivers/slimbus/Kconfig" +source "drivers/xpu/Kconfig" + endmenu diff --git a/drivers/Makefile b/drivers/Makefile index 578f469f72fb..1130b2d92df1 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -186,3 +186,4 @@ obj-$(CONFIG_MULTIPLEXER) += mux/ obj-$(CONFIG_UNISYS_VISORBUS) += visorbus/ obj-$(CONFIG_SIOX) += siox/ obj-$(CONFIG_GNSS) += gnss/ +obj-$(CONFIG_XPU_SCHEDULE) += xpu/ diff --git a/drivers/xpu/Kconfig b/drivers/xpu/Kconfig new file mode 100644 index 000000000000..c4a391d0039d --- /dev/null +++ b/drivers/xpu/Kconfig @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0 + +menuconfig XPU_SCHEDULE + bool "xpu schedule" + default n + help + Support xpu schedule, Say Y here if you want support for use + xpu schedule. + diff --git a/drivers/xpu/Makefile b/drivers/xpu/Makefile new file mode 100644 index 000000000000..9edc6dcdd4d0 --- /dev/null +++ b/drivers/xpu/Makefile @@ -0,0 +1 @@ +obj-y += xpu_group.o diff --git a/drivers/xpu/xpu_group.c b/drivers/xpu/xpu_group.c new file mode 100644 index 000000000000..53a598db0615 --- /dev/null +++ b/drivers/xpu/xpu_group.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/xpu_group.h> +#include <linux/rwsem.h> +#include <linux/slab.h> + +extern int ucc_rt_nr_running(struct xcu *cu); +static DECLARE_RWSEM(xpu_group_rwsem); + +static struct xpu_capability xpu_capability_root; + +struct xpu_group __xpu_root = { + .type = XPU_TYPE_ROOT, + .capability = &xpu_capability_root, + + .next_layer = IDR_INIT(next_layer), +}; + +struct xpu_group *xpu_root = &__xpu_root; +EXPORT_SYMBOL(xpu_root); + +int __xpu_group_attach(struct xpu_group *new_group, + struct xpu_group *previous_group) +{ + int id = new_group->id; + + if (id == -1) + id = idr_alloc(&previous_group->next_layer, new_group, + 0, INT_MAX, GFP_KERNEL); + else + id = idr_alloc(&previous_group->next_layer, new_group, + id, id + 1, GFP_KERNEL); + if (id < 0) + return -EEXIST; + + new_group->id = id; + new_group->previous_layer = previous_group; + + return 0; +} + +int xpu_group_attach(struct xpu_group *new_group, + struct xpu_group *previous_group) +{ + int ret; + + down_write(&xpu_group_rwsem); + ret = __xpu_group_attach(new_group, previous_group); + up_write(&xpu_group_rwsem); + return ret; +} +EXPORT_SYMBOL(xpu_group_attach); + +struct xpu_group *xpu_group_alloc_and_attach(struct xpu_group *previous_group, + int id) +{ + struct xpu_group *new = xpu_group_alloc(); + + if (!new) { + pr_err("alloc xpu_group failed\n"); + return NULL; + } + + new->id = id; + + if (!xpu_group_attach(new, previous_group)) + return NULL; + + return new; +} +EXPORT_SYMBOL(xpu_group_alloc_and_attach); + +int __xpu_group_detach(struct xpu_group *group) +{ + idr_remove(&group->previous_layer->next_layer, group->id); + return 0; +} + +int xpu_group_detach(struct xpu_group *group) +{ + int ret; + + down_write(&xpu_group_rwsem); + ret = __xpu_group_detach(group); + up_write(&xpu_group_rwsem); + return ret; +} +EXPORT_SYMBOL(xpu_group_detach); + +struct xpu_group *__xpu_group_find(struct xpu_group *group, int id) +{ + return idr_find(&group->next_layer, id); +} + +struct xpu_group *xpu_group_find(struct xpu_group *group, int id) +{ + struct xpu_group *p; + + p = xpu_group_alloc(); + + down_read(&xpu_group_rwsem); + p = __xpu_group_find(group, id); + up_read(&xpu_group_rwsem); + + return p; +} +EXPORT_SYMBOL(xpu_group_find); + + +struct xpu_group *xpu_idle_group_find(struct xpu_group *group) +{ + struct xpu_group *entry_group; + int id; + + down_read(&xpu_group_rwsem); + idr_for_each_entry(&group->next_layer, entry_group, id) { + if (!entry_group->used) { + up_read(&xpu_group_rwsem); + return entry_group; + } + } + up_read(&xpu_group_rwsem); + + return NULL; +} + +int xpu_run(struct xpu_group *group, void *para1, void *para2) +{ + int ret = 0; + + if (group->opt && group->opt->run) + ret = group->opt->run(group, para1, para2); + + return ret; +} + +int xpu_finish(struct xpu_group *group, void *para1, void *para2) +{ + if (group->opt && group->opt->finish) + return group->opt->finish(group, para1, para2); + + return 0; +} + +int xpu_wait(struct xpu_group *group, void *para1, void *para2, void *para3) +{ + if (group->opt && group->opt->wait) + return group->opt->wait(group, para1, para2, para3); + + return 0; +} + +int xpu_complete(struct xpu_group *group, void *para1, void *para2, void *para3) +{ + if (group->opt && group->opt->complete) + return group->opt->complete(group, para1, para2, para3); + + return 0; +} + +struct xpu_group *xpu_group_alloc(void) +{ + struct xpu_group *node = kzalloc(sizeof(*node), GFP_KERNEL); + + if (!node) + return NULL; + + node->type = XPU_TYPE_CUSTOM; + idr_init(&node->next_layer); + + return node; +} +EXPORT_SYMBOL(xpu_group_alloc); diff --git a/fs/proc/base.c b/fs/proc/base.c index dc9841826264..516eee1ae952 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -770,7 +770,6 @@ static const struct file_operations proc_single_file_operations = { .release = single_release, }; - struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode) { struct task_struct *task = get_proc_task(inode); @@ -1546,6 +1545,99 @@ static const struct file_operations proc_pid_sched_operations = { #endif +#ifdef CONFIG_XPU_SCHEDULE +static ssize_t ucc_step_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + char numbuf[PROC_NUMBUF]; + ssize_t len; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + + len = snprintf(numbuf, sizeof(numbuf), "%u\n", task->ucc_step); + + put_task_struct(task); + + return simple_read_from_buffer(buf, count, ppos, numbuf, len); +} + +static ssize_t ucc_step_write(struct file *file, const char __user *buf, + size_t count, loff_t *offset) +{ + struct inode *inode = file_inode(file); + struct task_struct *p; + int err; + unsigned int ucc_step; + + p = get_proc_task(inode); + if (!p) + return -ESRCH; + + err = kstrtouint_from_user(buf, count, 0, &ucc_step); + if (err) + return err; + + p->ucc_step = ucc_step; + put_task_struct(p); + + return count; +} + +static const struct file_operations ucc_step_operations = { + .write = ucc_step_write, + .read = ucc_step_read, +}; + +static ssize_t ucc_priority_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + char numbuf[PROC_NUMBUF]; + ssize_t len; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + + len = snprintf(numbuf, sizeof(numbuf), "%u\n", task->ucc_priority); + + put_task_struct(task); + + return simple_read_from_buffer(buf, count, ppos, numbuf, len); +} + +static ssize_t ucc_priority_write(struct file *file, const char __user *buf, + size_t count, loff_t *offset) +{ + struct inode *inode = file_inode(file); + struct task_struct *p; + int err; + unsigned int ucc_priority; + + p = get_proc_task(inode); + if (!p) + return -ESRCH; + + err = kstrtouint_from_user(buf, count, 0, &ucc_priority); + if (err) + return err; + + p->ucc_priority = ucc_priority; + put_task_struct(p); + + return count; +} + +static const struct file_operations ucc_priority_operations = { + .write = ucc_priority_write, + .read = ucc_priority_read, +}; + +#endif + #ifdef CONFIG_SCHED_AUTOGROUP /* * Print out autogroup related information: @@ -3151,6 +3243,10 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_ASCEND_SHARE_POOL ONE("sp_group", S_IRUGO, proc_sp_group_state), #endif +#ifdef CONFIG_XPU_SCHEDULE + REG("ucc_priority", 0644, ucc_priority_operations), + REG("ucc_step", 0644, ucc_step_operations), +#endif }; static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx) @@ -3537,6 +3633,10 @@ static const struct pid_entry tid_base_stuff[] = { #ifdef CONFIG_ASCEND_SHARE_POOL ONE("sp_group", S_IRUGO, proc_sp_group_state), #endif +#ifdef CONFIG_XPU_SCHEDULE + REG("ucc_priority", 0644, ucc_priority_operations), + REG("ucc_step", 0644, ucc_step_operations), +#endif }; static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8fd8c5b7cdc6..175659be95f3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1281,6 +1281,9 @@ struct task_struct { #if !defined(__GENKSYMS__) #if defined(CONFIG_QOS_SCHED_SMART_GRID) struct sched_grid_qos *grid_qos; +#elif defined(CONFIG_XPU_SCHEDULE) + u32 ucc_priority; + u32 ucc_step; #else KABI_RESERVE(8) #endif diff --git a/include/linux/ucc_common.h b/include/linux/ucc_common.h new file mode 100644 index 000000000000..3875c2226d24 --- /dev/null +++ b/include/linux/ucc_common.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _UCC_COMMON_H +#define _UCC_COMMON_H + +/* + * UCC Print Function + */ +#ifndef pr_fmt +#define pr_fmt(fmt) fmt +#endif + +#define ucc_err(fmt, ...) printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) + +#define ucc_warn(fmt, ...) printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__) + +#define ucc_info(fmt, ...) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) + +#define ucc_dbg(fmt, ...) printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) + +#endif diff --git a/include/linux/ucc_kfd.h b/include/linux/ucc_kfd.h new file mode 100644 index 000000000000..07eedc2fd5f2 --- /dev/null +++ b/include/linux/ucc_kfd.h @@ -0,0 +1,110 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef KFD_PRIV_H_INCLUDED +#define KFD_PRIV_H_INCLUDED + +#include <linux/mmu_notifier.h> +#include <linux/types.h> +#include <linux/kref.h> +#include <linux/mutex.h> +#include <linux/sched.h> +#include <linux/mmu_notifier.h> +#include <linux/idr.h> +#include <linux/dma-fence.h> +#include <linux/workqueue.h> +#include <linux/fs.h> +#include <linux/kobject.h> +#include <linux/sysfs.h> + +struct process_queue_manager; +struct kfd_process; +struct kfd_signal_page; + +struct process_queue_manager { + struct kfd_process *process; + struct list_head queues; + unsigned long *queue_slot_bitmap; +}; + +struct kfd_signal_page { + uint64_t *kernel_address; + uint64_t __user *user_address; + bool need_to_free_pages; +}; + +/* Process data */ +struct kfd_process { + struct hlist_node kfd_processes; + void *mm; + struct kref ref; + struct work_struct release_work; + struct mutex mutex; + struct task_struct *lead_thread; + struct mmu_notifier mmu_notifier; +/* TODO: check if use right branch */ + struct rcu_head rcu; + uint16_t pasid; + struct list_head per_device_data; + struct process_queue_manager pqm; + bool is_32bit_user_mode; + struct mutex event_mutex; + struct idr event_idr; + struct kfd_signal_page *signal_page; + size_t signal_mapped_size; + size_t signal_event_count; + bool signal_event_limit_reached; +/* TODO: check if use right branch */ + struct rb_root bo_interval_tree; + void *kgd_process_info; + struct dma_fence *ef; + struct delayed_work eviction_work; + struct delayed_work restore_work; + unsigned int last_eviction_seqno; + unsigned long last_restore_timestamp; + unsigned long last_evict_timestamp; + bool debug_trap_enabled; + uint32_t trap_debug_wave_launch_mode; + struct file *dbg_ev_file; + uint32_t allocated_debug_watch_point_bitmask; + struct kobject *kobj; + struct kobject *kobj_queues; + struct attribute attr_pasid; + bool has_cwsr; + uint64_t exception_enable_mask; + uint64_t exception_status; +}; + +struct kfd_ioctl_create_queue_args { + __u64 ring_base_address; /* to KFD */ + __u64 write_pointer_address; /* from KFD */ + __u64 read_pointer_address; /* from KFD */ + __u64 doorbell_offset; /* from KFD */ + + __u32 ring_size; /* to KFD */ + __u32 gpu_id; /* to KFD */ + __u32 queue_type; /* to KFD */ + __u32 queue_percentage; /* to KFD */ + __u32 queue_priority; /* to KFD */ + __u32 queue_id; /* from KFD */ + + __u64 eop_buffer_address; /* to KFD */ + __u64 eop_buffer_size; /* to KFD */ + __u64 ctx_save_restore_address; /* to KFD */ + __u32 ctx_save_restore_size; /* to KFD */ + __u32 ctl_stack_size; /* to KFD */ +}; + +struct kfd_ioctl_destroy_queue_args { + __u32 queue_id; /* to KFD */ + __u32 pad; +}; + +struct kfd_ioctl_update_queue_args { + __u64 ring_base_address; /* to KFD */ + + __u32 queue_id; /* to KFD */ + __u32 ring_size; /* to KFD */ + __u32 queue_percentage; /* to KFD */ + __u32 queue_priority; /* to KFD */ +}; +#endif diff --git a/include/linux/ucc_sched.h b/include/linux/ucc_sched.h new file mode 100644 index 000000000000..5b170545f7c2 --- /dev/null +++ b/include/linux/ucc_sched.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_UCC_SCHED_H__ +#define __LINUX_UCC_SCHED_H__ + +#include <linux/list.h> +#include <linux/types.h> +#include <linux/kernel.h> +#include <linux/hash.h> +#include <linux/rculist.h> +#include <linux/idr.h> +#include <linux/xpu_group.h> +#include <linux/hashtable.h> +#include <linux/vstream.h> +#include <linux/slab.h> +#include <linux/sched.h> + +#define VRTSQ_RTSQ_HASH_ORDER 6 + +#ifdef CONFIG_XPU_SCHEDULE +int ucc_process_task(struct vstream_info *vsqcq_info, struct tsdrv_ctx *ctx, + int *sqenum); +int ucc_free_task(struct vstream_info *vsqcq_info, struct tsdrv_ctx *ctx); +int ucc_wait_cq(struct vstream_info *vsqcq_info, struct tsdrv_ctx *ctx, + struct devdrv_report_para *arg, int *sqenum); +struct xpu_group *select_sq(struct vstream_info *vstream_info); +int ucc_sched_register_xcu(int dev_id, int ts_id, int cu_num); +void ucc_set_vstream_state(struct vstream_info *vinfo, int state); +void ucc_dequeue_task(struct vstream_info *vInfo); +int ucc_rt_nr_running(struct xcu *cu); +struct xcu *ucc_get_xcu_by_id(int cu_id); +int ucc_xcu_is_sched(int cu_id); +void ucc_dump_statistics_info(struct ucc_se *se); +#endif + +#endif diff --git a/include/linux/ucc_sched/ucc_sched.h b/include/linux/ucc_sched/ucc_sched.h new file mode 100644 index 000000000000..6edd8930e09e --- /dev/null +++ b/include/linux/ucc_sched/ucc_sched.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) Huawei Technologies Co., Ltd. 2019. All rights reserved. + * Author: Huawei OS Kernel Lab + * Create: Mon Jan 30 14:29:19 2023 + */ + +#ifndef __LINUX_UCC_SCHED_USCHED_H__ +#define __LINUX_UCC_SCHED_USCHED_H__ + +enum ucc_se_state { + SE_PREPARE, + SE_READY, + SE_RUNNING, + SE_BLOCK, + SE_DEAD, +}; + +enum ucc_se_flag { + UCC_TIF_NONE, + UCC_TIF_PREEMPT, + UCC_TIF_BALANCE, +}; + +enum ucc_se_prio { + UCC_PRIO_HIGH, + UCC_PRIO_LOW, +}; + +enum ucc_se_step { + UCC_STEP_SLOW = 1, + UCC_STEP_FAST = 10, +}; + +struct ucc_statistics { + u64 wait_start; + u64 wait_max; + u64 wait_count; + u64 wait_sum; + + u64 preempt_start; + u64 preempt_max; + u64 preempt_count; + u64 preempt_sum; + + u64 kernel_sum; + u64 timeout_count; + + u64 run_start; + u64 run_max; + u64 run_count; + u64 run_sum; +}; + +struct ucc_se { + int on_cu; + struct list_head run_list; + enum ucc_se_state state; + enum ucc_se_flag flag; + enum ucc_se_prio prio; + enum ucc_se_step step; + raw_spinlock_t se_lock; + struct ucc_statistics statistics; + int is_timeout; +}; + +int ucc_sched_init(void); +int ucc_schedule(int cu_id); +int ucc_wake_up(struct ucc_se *se); + +#endif diff --git a/include/linux/ucc_ts.h b/include/linux/ucc_ts.h new file mode 100644 index 000000000000..7280ccca1059 --- /dev/null +++ b/include/linux/ucc_ts.h @@ -0,0 +1,254 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef TS_H +#define TS_H + +#include <linux/file.h> +#include <linux/device.h> +#include <linux/cdev.h> +#include <linux/fs.h> + +#define DEVDRV_MAX_SQ_DEPTH (1024) +#define DEVDRV_SQ_SLOT_SIZE (64) + +#define DEVDRV_MAX_SQ_NUM (512 - 1) +#define DEVDRV_MAX_CQ_NUM (352 - 1) + +#define DEVDRV_MAX_TS_NUM (1) + +#define REMAP_ALIGN_SIZE (64 * 1024) +#define REMAP_ALIGN_MASK (~(REMAP_ALIGN_SIZE - 1)) +#define REMAP_ALIGN(x) (((x) + REMAP_ALIGN_SIZE - 1) & \ + REMAP_ALIGN_MASK) + +#define DEVDRV_DB_SPACE_SIZE (1024 * 4096) + +#define SQCQ_RTS_INFO_LENGTH 5 +#define SQCQ_RESV_LENGTH 8 + +#define DEVDRV_CBCQ_MAX_GID 128 + +enum phy_sqcq_type { + NORMAL_SQCQ_TYPE = 0, + CALLBACK_SQCQ_TYPE, + LOGIC_SQCQ_TYPE, + SHM_SQCQ_TYPE, + DFX_SQCQ_TYPE, + TS_SQCQ_TYPE, + KERNEL_SQCQ_TYPE, +}; + +struct notifier_operations { + int (*notifier_call)(struct file *file_op, unsigned long mode); +}; + +#define MAX_DEVICE_COUNT 64 + +struct davinci_intf_stru { + atomic_t count; + struct mutex dmutex; + struct cdev cdev; + struct device *device; + struct list_head process_list; + struct list_head module_list; + unsigned int device_status[MAX_DEVICE_COUNT]; + cpumask_var_t cpumask; +}; + +#define DAVINIC_MODULE_NAME_MAX 256 +struct davinci_intf_private_stru { + char module_name[DAVINIC_MODULE_NAME_MAX]; + unsigned int device_id; + pid_t owner_pid; + int close_flag; + atomic_t work_count; + int release_status; + struct mutex fmutex; + const struct file_operations fops; + struct notifier_operations notifier; + struct davinci_intf_stru *device_cb; + struct file priv_filep; + unsigned int free_type; +}; + +enum sqcq_alloc_status { + SQCQ_INACTIVE = 0, + SQCQ_ACTIVE +}; + +struct devdrv_ts_sq_info { + enum phy_sqcq_type type; + pid_t tgid; + u32 head; + u32 tail; + u32 credit; + u32 index; + int uio_fd; + + u8 *uio_addr; + int uio_size; + + enum sqcq_alloc_status alloc_status; + u64 send_count; + + void *sq_sub; +}; + +struct devdrv_ts_cq_info { + enum phy_sqcq_type type; + pid_t tgid; + u32 vfid; + + u32 head; + u32 tail; + u32 release_head; /* runtime read cq head value */ + u32 index; + u32 phase; + u32 int_flag; + + int uio_fd; + + u8 *uio_addr; + int uio_size; + + enum sqcq_alloc_status alloc_status; + u64 receive_count; + + void *cq_sub; + + void (*complete_handle)(struct devdrv_ts_cq_info *cq_info); + + u8 slot_size; +}; + +#define DEVDRV_SQ_INFO_OCCUPY_SIZE \ + (sizeof(struct devdrv_ts_sq_info) * DEVDRV_MAX_SQ_NUM) +#define DEVDRV_CQ_INFO_OCCUPY_SIZE \ + (sizeof(struct devdrv_ts_cq_info) * DEVDRV_MAX_CQ_NUM) + +#define DEVDRV_MAX_INFO_SIZE \ + (DEVDRV_SQ_INFO_OCCUPY_SIZE + DEVDRV_CQ_INFO_OCCUPY_SIZE) +#define DEVDRV_VM_SQ_MEM_OFFSET 0 +#define DEVDRV_VM_SQ_SLOT_SIZE \ + REMAP_ALIGN(DEVDRV_MAX_SQ_DEPTH * DEVDRV_SQ_SLOT_SIZE) +#define DEVDRV_VM_SQ_MEM_SIZE \ + (DEVDRV_VM_SQ_SLOT_SIZE * DEVDRV_MAX_SQ_NUM) + +#define DEVDRV_VM_INFO_MEM_OFFSET \ + (DEVDRV_VM_SQ_MEM_OFFSET + DEVDRV_VM_SQ_MEM_SIZE) +#define DEVDRV_VM_INFO_MEM_SIZE REMAP_ALIGN(DEVDRV_MAX_INFO_SIZE) + +#define DEVDRV_VM_DB_MEM_OFFSET \ + (DEVDRV_VM_INFO_MEM_OFFSET + DEVDRV_VM_INFO_MEM_SIZE) +#define DEVDRV_VM_DB_MEM_SIZE REMAP_ALIGN(DEVDRV_DB_SPACE_SIZE) + +#define DEVDRV_VM_CQ_MEM_OFFSET \ + (DEVDRV_VM_DB_MEM_OFFSET + DEVDRV_VM_DB_MEM_SIZE) + +enum tsdrv_id_type { + TSDRV_STREAM_ID, + TSDRV_NOTIFY_ID, + TSDRV_MODEL_ID, + TSDRV_EVENT_SW_ID, /* should use for event alloc/free/inquiry res_num*/ + TSDRV_EVENT_HW_ID, + TSDRV_IPC_EVENT_ID, + TSDRV_SQ_ID, + TSDRV_CQ_ID, + TSDRV_PCQ_ID, + TSDRV_MAX_ID, +}; + +#define TSDRV_CQ_REUSE 0x00000001 +#define TSDRV_SQ_REUSE 0x00000002 + +struct normal_alloc_sqcq_para { + uint32_t fd; + uint32_t tsId; + uint32_t devId; + uint32_t sqeSize; + uint32_t cqeSize; + uint32_t sqeDepth; + uint32_t cqeDepth; + uint32_t grpId; + uint32_t flag; + uint32_t sqId; + uint32_t cqId; + uint32_t priority; + uint32_t info[SQCQ_RTS_INFO_LENGTH]; + uint32_t res[SQCQ_RESV_LENGTH]; +}; + +struct normal_free_sqcq_para { + uint32_t tsId; + uint32_t flag; + uint32_t sqId; + uint32_t cqId; + uint32_t res[SQCQ_RESV_LENGTH]; +}; + +struct tsdrv_sqcq_data_para { + uint32_t id; + uint32_t val; +}; + +struct devdrv_report_para { + int timeout; + u32 cq_tail; + u32 cq_id; +}; + +struct tsdrv_ts_id_ctx { + u32 id_num; + struct list_head id_list; + spinlock_t id_lock; +}; +struct tsdrv_ts_ctx { + u32 tsid; + atomic_t status; + u32 send_count; + u64 receive_count; + + int32_t cq_tail_updated; + wait_queue_head_t report_wait; + + struct work_struct recycle_work; + + wait_queue_head_t cbcq_wait[DEVDRV_CBCQ_MAX_GID]; + + void *shm_sqcq_ctx; + void *logic_sqcq_ctx; + void *sync_cb_sqcq_ctx; // mini callback + + struct tsdrv_ts_id_ctx id_ctx[TSDRV_MAX_ID]; + + /* only used by vm */ + u32 vcqid; + u32 wait_queue_inited; + u32 cq_report_status; + int32_t cq_tail; + spinlock_t ctx_lock; + + u32 recycle_cbsqcq_num; // min callback +}; + +//Context Delivers +struct tsdrv_ctx { + u32 ctx_index; + atomic_t status; + atomic_t type; + pid_t tgid; + pid_t pid; + int32_t ssid; + u32 thread_bind_irq_num; + u32 mirror_ctx_status; + struct rb_node node; + struct list_head list; + struct vm_area_struct *vma[DEVDRV_MAX_TS_NUM]; + spinlock_t ctx_lock; + struct mutex mutex_lock; + struct tsdrv_ts_ctx ts_ctx[DEVDRV_MAX_TS_NUM]; + + u64 unique_id; /* mark unique processes for vm */ +}; + +#endif diff --git a/include/linux/vstream.h b/include/linux/vstream.h new file mode 100644 index 000000000000..14d799296053 --- /dev/null +++ b/include/linux/vstream.h @@ -0,0 +1,123 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_VSTREAM_H +#define _LINUX_VSTREAM_H + +#include <linux/ucc_kfd.h> +#include <linux/ucc_sched/ucc_sched.h> +#include <linux/ucc_ts.h> + +#define MAX_VSTREAM_SIZE 1024 +#define MAX_VSTREAM_SLOT_SIZE 64 +#define MAX_CQ_SLOT_SIZE 12 + +/* + * XXX_VSTREAM_ALLOC: alloc a vstream, buffer for tasks + * XXX_VSTREAM_FREE: free a vstream + * XXX_VSTREAM_KICK: there are tasks to be executed in the vstream + * XXX_VSTREAM_UPDATE: update information for an existing vstream + * XXX_CALLBACK_VSTREAM_WAIT: waiting for callback tasks + * XXX_CALLBACK_VSTREAM_KICK: callback tasks have been executed + * + * NOTE: Callback vstream is only for Ascend now. We do not need + * CALLBACK_VSTREAM_ALLOC because the callback vstream will be + * alloced with vstream on Ascend. + */ +enum VSTREAM_COMMAND { + /* vstream command for Ascend */ + ASCEND_VSTREAM_ALLOC = 0, + ASCEND_VSTREAM_FREE, + ASCEND_VSTREAM_KICK, + ASCEND_CALLBACK_VSTREAM_WAIT, + ASCEND_CALLBACK_VSTREAM_KICK, + ASCEND_VSTREAM_GET_HEAD, + ASCEND_MAX_COMMAND, + + /* vstream command for amdgpu */ + AMDGPU_VSTREAM_ALLOC = ASCEND_MAX_COMMAND + 1, + AMDGPU_VSTREAM_FREE, + AMDGPU_VSTREAM_KICK, + AMDGPU_VSTREAM_UPDATE, + AMDGPU_MAX_COMMAND, +}; + +struct vstream_alloc_args { + union { + /* For Ascend */ + struct normal_alloc_sqcq_para ascend; + /* For amdgpu */ + struct kfd_ioctl_create_queue_args amdgpu; + }; +}; + +struct vstream_free_args { + union { + /* For Ascend */ + struct normal_free_sqcq_para ascend; + /* For amdgpu */ + struct kfd_ioctl_destroy_queue_args amdgpu; + }; +}; + +struct vstream_kick_args { + union { + /* For Ascend */ + struct tsdrv_sqcq_data_para ascend; + /* For amdgpu */ + }; +}; + +struct vstream_args { + union { + struct vstream_alloc_args va_args; + struct vstream_free_args vf_args; + struct vstream_kick_args vk_args; + struct kfd_ioctl_update_queue_args vu_args; + struct tsdrv_sqcq_data_para vh_args; + struct devdrv_report_para cvw_args; + struct tsdrv_sqcq_data_para cvk_args; + }; +}; + +struct vstream_node { + uint32_t id; + uint32_t head; + uint32_t tail; + uint32_t credit; + void *vstreamData; + raw_spinlock_t spin_lock; +}; + +struct vstream_id { + uint32_t vstreamId; + struct list_head list; +}; + +struct vcq_map_table { + uint32_t vcqId; + struct vstream_node *vcqNode; + struct list_head vstreamId_list; +}; + +struct vstream_info { + uint32_t vstreamId; //key + uint32_t vcqId; + uint32_t devId; + uint32_t tsId; + struct ucc_se se; + //TODO::check name + struct vstream_node *vsqNode; + struct vstream_node *vcqNode; + void *privdata; + uint32_t info[SQCQ_RTS_INFO_LENGTH]; + int cu_id; + struct xpu_group *group; + int send_cnt; + struct task_struct *p; +}; + +typedef int vstream_manage_t(struct vstream_args *arg); +int update_vstream_head(struct vstream_info *vstream_info, int num); +struct vstream_info *vstream_get_info(uint32_t id); +bool vstream_have_kernel(struct ucc_se *se); + +#endif /* _LINUX_VSTREAM_H */ diff --git a/include/linux/xpu_group.h b/include/linux/xpu_group.h new file mode 100644 index 000000000000..5e3a96b15f9c --- /dev/null +++ b/include/linux/xpu_group.h @@ -0,0 +1,66 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __XPU_GROUP_H__ +#define __XPU_GROUP_H__ +#include <linux/idr.h> + +struct xpu_group; +struct xcu; + +enum xpu_type { + XPU_TYPE_ROOT, + XPU_TYPE_TASK_QUEUE, + XPU_TYPE_NPU_310, + XPU_TYPE_CUSTOM, +}; + +enum xpu_capability_type { + TYPE_1, + XPU_CAPABILITY_TYPE_NR, +}; + +struct xpu_capability { + unsigned long capacities[XPU_CAPABILITY_TYPE_NR]; +}; + +struct xpu_operation { + int (*run)(struct xpu_group *group, void *para1, void *para2); + int (*finish)(struct xpu_group *group, void *para1, void *para2); + int (*wait)(struct xpu_group *group, void *para1, void *para2, + void *para3); + int (*complete)(struct xpu_group *group, void *para1, void *para2, + void *para3); +}; + +struct xpu_group { + int id; + enum xpu_type type; + struct xpu_capability *capability; + + struct xpu_group *previous_layer; + struct idr next_layer; + + struct xpu_operation *opt; + + int used; + + void *data; +}; + +extern struct xpu_group *xpu_root; + +#ifdef CONFIG_XPU_SCHEDULE +int xpu_group_attach(struct xpu_group *new_group, + struct xpu_group *previous_group); +int xpu_group_detach(struct xpu_group *group); +struct xpu_group *xpu_group_find(struct xpu_group *group, int id); +struct xpu_group *xpu_idle_group_find(struct xpu_group *group); +struct xpu_group *xpu_group_alloc(void); +struct xpu_group *xpu_group_alloc_and_attach(struct xpu_group *previous_group, + int id); +int xpu_run(struct xpu_group *group, void *para1, void *para2); +int xpu_finish(struct xpu_group *group, void *para1, void *para2); +int xpu_wait(struct xpu_group *group, void *para1, void *para2, void *para3); +#endif + +#endif diff --git a/include/trace/events/ucc_sched.h b/include/trace/events/ucc_sched.h new file mode 100644 index 000000000000..104a39b2f41c --- /dev/null +++ b/include/trace/events/ucc_sched.h @@ -0,0 +1,120 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM ucc_sched + +#if !defined(_TRACE_UCC_SCHED_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_UCC_SCHED_H + +#include <linux/tracepoint.h> +#include <linux/binfmts.h> + +/* + * XXX the below ucc_sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE + * adding ucc_sched_stat support to SCHED_FIFO/RR would be welcome. + */ +DECLARE_EVENT_CLASS(ucc_sched_stat_template, + + TP_PROTO(struct vstream_info *vinfo, u64 delay), + + TP_ARGS(vinfo, delay), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(int, cu_id) + __field(u32, vstreamId) + __field(u32, prio) + __field(u64, delay) + ), + + TP_fast_assign( + memcpy(__entry->comm, vinfo->p->comm, TASK_COMM_LEN); + __entry->pid = vinfo->p->pid; + __entry->cu_id = vinfo->cu_id; + __entry->vstreamId = vinfo->vstreamId; + __entry->prio = vinfo->p->ucc_priority; + __entry->delay = delay; + ), + + TP_printk("comm=%s pid=%d cu_id=%d vstreamId %u prio %u, delay=%llu [ns]", + __entry->comm, __entry->pid, + __entry->cu_id, __entry->vstreamId, __entry->prio, + (unsigned long long)__entry->delay) +); + +DECLARE_EVENT_CLASS(ucc_sched_stat_template_1, + + TP_PROTO(struct vstream_info *vinfo, u64 delay, int is_timeout), + + TP_ARGS(vinfo, delay, is_timeout), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(int, cu_id) + __field(u32, vstreamId) + __field(u64, delay) + __field(int, is_timeout) + ), + + TP_fast_assign( + memcpy(__entry->comm, vinfo->p->comm, TASK_COMM_LEN); + __entry->pid = vinfo->p->pid; + __entry->cu_id = vinfo->cu_id; + __entry->vstreamId = vinfo->vstreamId; + __entry->delay = delay; + __entry->is_timeout = is_timeout; + ), + + TP_printk("comm=%s pid=%d cu_id=%d vstreamId %u, delay=%llu [ns]:%d", + __entry->comm, __entry->pid, + __entry->cu_id, __entry->vstreamId, + (unsigned long long)__entry->delay, + __entry->is_timeout) +); +/* + * Tracepoint for accounting wait time (time the task is runnable + * but not actually running due to scheduler contention). + */ +DEFINE_EVENT(ucc_sched_stat_template, ucc_sched_stat_wait, + TP_PROTO(struct vstream_info *vinfo, u64 delay), + TP_ARGS(vinfo, delay)); + +DEFINE_EVENT(ucc_sched_stat_template, ucc_sched_stat_preempt, + TP_PROTO(struct vstream_info *vinfo, u64 delay), + TP_ARGS(vinfo, delay)); + +DEFINE_EVENT(ucc_sched_stat_template_1, ucc_sched_stat_run, + TP_PROTO(struct vstream_info *vinfo, u64 delay, int is_timeout), + TP_ARGS(vinfo, delay, is_timeout)); + +TRACE_EVENT(ucc_sched_switch, + + TP_PROTO(int preempt, + struct vstream_info *next), + + TP_ARGS(preempt, next), + + TP_STRUCT__entry( + __field(int, cu_id) + __field(u32, next_vstreamId) + __field(u32, next_prio) + __field(int, preempt) + ), + + TP_fast_assign( + __entry->cu_id = next->cu_id; + __entry->next_vstreamId = next->vstreamId; + __entry->next_prio = next->p->ucc_priority; + __entry->preempt = preempt; + ), + + TP_printk("cu_id=%d next_vstreamId %u next_prio %u preempt[%d]", + __entry->cu_id, + __entry->next_vstreamId, __entry->next_prio, + __entry->preempt) +); +#endif /* _TRACE_UCC_SCHED_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/init/init_task.c b/init/init_task.c index b312a045f4b9..c1a78b4da368 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -188,6 +188,10 @@ struct task_struct init_task .fork_pid = 0, }, #endif +#ifdef CONFIG_XPU_SCHEDULE + .ucc_priority = 1, + .ucc_step = 1, +#endif }; EXPORT_SYMBOL(init_task); diff --git a/init/main.c b/init/main.c index 50af60ff0ef6..7ed2e67d7011 100644 --- a/init/main.c +++ b/init/main.c @@ -66,6 +66,7 @@ #include <linux/kthread.h> #include <linux/sched.h> #include <linux/sched/init.h> +#include <linux/ucc_sched/ucc_sched.h> #include <linux/signal.h> #include <linux/idr.h> #include <linux/kgdb.h> @@ -599,6 +600,14 @@ asmlinkage __visible void __init start_kernel(void) * time - but meanwhile we still have a functioning scheduler. */ sched_init(); + +#ifdef CONFIG_XPU_SCHEDULE + /* + * Set up the ucc scheduler, to enable heterogeneous scheduling. + */ + ucc_sched_init(); +#endif + /* * Disable preemption - early bootup scheduling is extremely * fragile until we cpu_idle() for the first time. diff --git a/kernel/Makefile b/kernel/Makefile index d0482bd27ba4..273fe481d303 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -43,6 +43,8 @@ obj-y += irq/ obj-y += rcu/ obj-y += livepatch/ obj-y += dma/ +obj-(CONFIG_XPU_SCHEDULE) += ucc_sched/ +obj-(CONFIG_XPU_UCC) += ucc/ obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 0612af002ae5..0f659b2ad251 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -19,6 +19,7 @@ endif obj-y += core.o loadavg.o clock.o cputime.o obj-y += idle.o fair.o rt.o deadline.o obj-y += wait.o wait_bit.o swait.o completion.o +obj-(CONFIG_XPU_SCHEDULE) += ucc_sched.o obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 67bda877bfa8..89348097b29a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2316,6 +2316,11 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) */ p->prio = current->normal_prio; +#ifdef CONFIG_XPU_SCHEDULE + p->ucc_priority = current->ucc_priority; + p->ucc_step = current->ucc_step; +#endif + /* * Revert to default priority/policy on fork if requested. */ diff --git a/kernel/sched/ucc_sched.c b/kernel/sched/ucc_sched.c new file mode 100644 index 000000000000..646f120c3c34 --- /dev/null +++ b/kernel/sched/ucc_sched.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/ucc_sched.h> +#include <linux/ucc_common.h> + +static DEFINE_MUTEX(revmap_mutex); + +static DEFINE_HASHTABLE(vrtsq_rtsq_revmap, VRTSQ_RTSQ_HASH_ORDER); + +/** + * @group: value for this entry. + * @hash_node : hash node list. + * @ + */ +struct vsqce_idx_revmap_data { + unsigned int vrtsdId; + struct xpu_group *group; + struct hlist_node hash_node; +}; + +struct xpu_group *select_sq(struct vstream_info *vstream_info) +{ + struct vsqce_idx_revmap_data *revmap_data; + + /* find history */ + mutex_lock(&revmap_mutex); + hash_for_each_possible(vrtsq_rtsq_revmap, revmap_data, hash_node, + (unsigned long)vstream_info->vstreamId) { + if (revmap_data && revmap_data->group) { + mutex_unlock(&revmap_mutex); + return revmap_data->group; + } + } + mutex_unlock(&revmap_mutex); + + revmap_data = kzalloc(sizeof(struct vsqce_idx_revmap_data), GFP_KERNEL); + if (revmap_data == NULL) + return NULL; + /* find XPU group */ + revmap_data->group = xpu_group_find(xpu_root, XPU_TYPE_NPU_310); + if (revmap_data->group == NULL) { + ucc_err("find XPU group is failed.\n"); + return NULL; + } + /* find device group */ + revmap_data->group = xpu_group_find(revmap_data->group, + vstream_info->devId); + if (revmap_data->group == NULL) { + ucc_err("find device group is failed.\n"); + return NULL; + } + /* find tsgroup */ + revmap_data->group = xpu_group_find(revmap_data->group, + vstream_info->tsId); + if (revmap_data->group == NULL) { + ucc_err("find ts group is failed.\n"); + return NULL; + } + + /* select idle xcu */ + revmap_data->group = xpu_idle_group_find(revmap_data->group); + if (revmap_data->group == NULL) { + ucc_err("find rtsq group is failed.\n"); + return NULL; + } + + revmap_data->vrtsdId = vstream_info->vstreamId; + /* set group used : 1 */ + revmap_data->group->used = 1; + + mutex_lock(&revmap_mutex); + hash_add(vrtsq_rtsq_revmap, &revmap_data->hash_node, + (unsigned long)vstream_info->vstreamId); + mutex_unlock(&revmap_mutex); + return revmap_data->group; +} + +int ucc_process_task(struct vstream_info *vstream_info, struct tsdrv_ctx *ctx, + int *sqenum) +{ + struct xpu_group *group = NULL; + + if (vstream_info == NULL) { + ucc_err("vsqcq_info is NULL\n"); + return -1; + } + + group = select_sq(vstream_info); + if (group == NULL) { + ucc_err("find group is failed.\n"); + return -1; + } + /* send sqe */ + *sqenum = xpu_run(group, vstream_info, ctx); + + return 0; +} +EXPORT_SYMBOL(ucc_process_task); + +int ucc_free_task(struct vstream_info *vstream_info, struct tsdrv_ctx *ctx) +{ + struct vsqce_idx_revmap_data *revmap_data; + + ucc_dequeue_task(vstream_info); + + while (!ucc_xcu_is_sched(vstream_info->cu_id)) + schedule_timeout_interruptible(10); + + ucc_dump_statistics_info(&vstream_info->se); + + mutex_lock(&revmap_mutex); + hash_for_each_possible(vrtsq_rtsq_revmap, revmap_data, hash_node, + (unsigned long)vstream_info->vstreamId) { + if (revmap_data && + revmap_data->vrtsdId == vstream_info->vstreamId && + revmap_data->group) { + xpu_finish(revmap_data->group, vstream_info, ctx); + /* set group unused : 0 */ + revmap_data->group->used = 0; + hash_del(&revmap_data->hash_node); + kfree(revmap_data); + revmap_data = NULL; + break; + } + } + mutex_unlock(&revmap_mutex); + + return 0; +} +EXPORT_SYMBOL(ucc_free_task); + +int ucc_wait_cq(struct vstream_info *vstream_info, struct tsdrv_ctx *ctx, + struct devdrv_report_para *arg, int *cqenum) +{ + struct vsqce_idx_revmap_data *revmap_data; + + hash_for_each_possible(vrtsq_rtsq_revmap, revmap_data, hash_node, + (unsigned long)vstream_info->vstreamId) { + if (revmap_data && + revmap_data->vrtsdId == vstream_info->vstreamId && + revmap_data->group) + *cqenum = xpu_wait(revmap_data->group, vstream_info, + ctx, arg); + } + + return 0; +} +EXPORT_SYMBOL(ucc_wait_cq); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c7064f67f4a5..aeceb9e9c927 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -117,6 +117,10 @@ extern unsigned int sysctl_nr_open_min, sysctl_nr_open_max; extern int sysctl_nr_trim_pages; #endif +#ifdef CONFIG_XPU_SCHEDULE +extern int sysctl_ucc_sched_rcv_timeout_ms; +#endif + /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR static int sixty = 60; @@ -139,7 +143,7 @@ static int one_thousand = 1000; #ifdef CONFIG_PRINTK static int ten_thousand = 10000; #endif -#if defined(CONFIG_QOS_SCHED) || defined(CONFIG_QOS_SCHED_SMART_GRID) +#if defined(CONFIG_QOS_SCHED) || defined(CONFIG_QOS_SCHED_SMART_GRID) || defined(CONFIG_XPU_SCHEDULE) static int hundred_thousand = 100000; #endif #ifdef CONFIG_PERF_EVENTS @@ -352,6 +356,17 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_XPU_SCHEDULE + { + .procname = "ucc_sched_rcv_timeout", + .data = &sysctl_ucc_sched_rcv_timeout_ms, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &hundred_thousand, + }, +#endif #ifdef CONFIG_SCHED_DEBUG { .procname = "sched_min_granularity_ns", diff --git a/kernel/ucc/Kconfig b/kernel/ucc/Kconfig new file mode 100644 index 000000000000..279c11f702b1 --- /dev/null +++ b/kernel/ucc/Kconfig @@ -0,0 +1,21 @@ +# +# TODO: add description +# + +config XPU_UCC + bool "ucc" + default n + depends on ARM64 || X86 + help + Say Y here if you want support for using XPU UCC. XPU UCC + is helpfer for XPU schedule. The full name of UCC is + Universal Converged Computing. + + +config XPU_VSTREAM + bool "virtual submit queue and complete queue" + default n + depends on XPU_UCC + help + virtual Submit Queue and Complete Queue support for XPU. + It is used to help XPU schedule. diff --git a/kernel/ucc/Makefile b/kernel/ucc/Makefile new file mode 100644 index 000000000000..0e2735d2aef4 --- /dev/null +++ b/kernel/ucc/Makefile @@ -0,0 +1 @@ +obj-y += ascend_vstream.o vstream.o diff --git a/kernel/ucc/ascend_vstream.c b/kernel/ucc/ascend_vstream.c new file mode 100644 index 000000000000..d248aaff7639 --- /dev/null +++ b/kernel/ucc/ascend_vstream.c @@ -0,0 +1,654 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/uaccess.h> +#include <linux/syscalls.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/vstream.h> +#include <linux/slab.h> +#include <linux/list.h> +#include <linux/ucc_common.h> +#include <linux/ucc_sched.h> + +DEFINE_MUTEX(vstreamId_Bitmap_mutex); +static DECLARE_BITMAP(vstreamIdBitmap, DEVDRV_MAX_SQ_NUM); + +static DEFINE_MUTEX(vcqId_Bitmap_mutex); +static DECLARE_BITMAP(vcqIdBitmap, DEVDRV_MAX_CQ_NUM); + +static DEFINE_MUTEX(revmap_mutex); + +static struct vstream_info *vstreamContainer[DEVDRV_MAX_SQ_NUM]; +static struct vcq_map_table *vsqcqMapTable[DEVDRV_MAX_CQ_NUM]; + +#define MAX_SQ_SIZE (MAX_VSTREAM_SIZE * MAX_VSTREAM_SLOT_SIZE) +#define MAX_CQ_SIZE (MAX_VSTREAM_SIZE * MAX_CQ_SLOT_SIZE) + +#define SQ_USER_ADDR_OFFSET(id) ((unsigned long)REMAP_ALIGN(MAX_SQ_SIZE) * id) +#define CQ_USER_ADDR_OFFSET(id) ((unsigned long)REMAP_ALIGN(MAX_CQ_SIZE) * id) + +#define SQ_VSTREAM_DATA(id) vstreamContainer[id]->vsqNode->vstreamData +#define CQ_VSTREAM_DATA(id) vstreamContainer[id]->vcqNode->vstreamData + +static struct tsdrv_ctx *get_ctx(int fd) +{ + struct fd f; + struct davinci_intf_private_stru *file_private_data; + struct tsdrv_ctx *ctx = NULL; + + f = fdget(fd); + if (!f.file) + goto out; + + file_private_data = f.file->private_data; + if (!file_private_data) + goto out; + + ctx = file_private_data->priv_filep.private_data; + +out: + fdput(f); + return ctx; +} + +static struct vcq_map_table *vstream_get_map_table(uint32_t id) +{ + return vsqcqMapTable[id]; +} + +static void free_vstreamId(uint32_t vstreamId) +{ + mutex_lock(&vstreamId_Bitmap_mutex); + clear_bit(vstreamId, vstreamIdBitmap); + mutex_unlock(&vstreamId_Bitmap_mutex); +} + +static void free_vcqId(uint32_t vcqId, uint32_t flag) +{ + mutex_lock(&vcqId_Bitmap_mutex); + if (!(flag & TSDRV_CQ_REUSE)) + clear_bit(vcqId, vcqIdBitmap); + mutex_unlock(&vcqId_Bitmap_mutex); +} + +static void vstream_free_map_table(uint32_t vcqId, uint32_t vstreamId, + uint32_t flag) +{ + struct vcq_map_table *freeTable = NULL; + struct vstream_id *vstreamIdNode = NULL; + + freeTable = vstream_get_map_table(vcqId); + if (!freeTable) { + ucc_err("No map found for vcq:%d.\n", vcqId); + return; + } + + list_for_each_entry(vstreamIdNode, &freeTable->vstreamId_list, list) { + if (vstreamIdNode->vstreamId == vstreamId) { + list_del(&vstreamIdNode->list); + kfree(vstreamIdNode); + break; + } + } + if (!(flag & TSDRV_CQ_REUSE)) { + kfree(freeTable->vcqNode->vstreamData); + kfree(freeTable->vcqNode); + kfree(freeTable); + } +} + +static void vstream_alloc_ucc_se(struct ucc_se *se) +{ + memset(&se->statistics, 0, sizeof(se->statistics)); + se->on_cu = 0; + se->state = SE_PREPARE; + se->flag = UCC_TIF_NONE; + se->prio = UCC_PRIO_HIGH; + se->step = UCC_STEP_SLOW; + raw_spin_lock_init(&se->se_lock); +} + +static struct vstream_info *vstream_create_info(struct tsdrv_ctx *ctx, + struct normal_alloc_sqcq_para *para) +{ + struct vcq_map_table *mapTable = NULL; + + struct vstream_info *vstream = kzalloc(sizeof(struct vstream_info), + GFP_KERNEL); + if (!vstream) + return NULL; + + (void)memcpy(vstream->info, para->info, + sizeof(uint32_t) * SQCQ_RTS_INFO_LENGTH); + + vstream->privdata = ctx; + vstream->tsId = para->tsId; + vstream->vstreamId = para->sqId; + vstream->vcqId = para->cqId; + + mapTable = vstream_get_map_table(vstream->vcqId); + if (!mapTable || !mapTable->vcqNode) { + ucc_err("No map found for vcqId:%d.\n", vstream->vcqId); + goto free_vstream; + } + vstream->vcqNode = mapTable->vcqNode; + vstream->vsqNode = kmalloc(sizeof(struct vstream_node), GFP_KERNEL); + if (!vstream->vsqNode) { + ucc_err("Failed to alloc memory for vsqNode:%d.\n", + vstream->vstreamId); + goto free_vstream; + } + vstream->vsqNode->vstreamData = kmalloc(MAX_SQ_SIZE, GFP_KERNEL); + if (!vstream->vsqNode->vstreamData) + goto free_vsqNode; + vstream->vsqNode->id = vstream->vstreamId; + vstream->vsqNode->head = 0; + vstream->vsqNode->tail = 0; + vstream->vsqNode->credit = MAX_VSTREAM_SIZE; + raw_spin_lock_init(&vstream->vsqNode->spin_lock); + vstream->send_cnt = 0; + vstream->p = current; + vstream_alloc_ucc_se(&vstream->se); + + return vstream; + +free_vsqNode: + kfree(vstream->vsqNode); + +free_vstream: + kfree(vstream); + return NULL; +} + +struct vstream_info *vstream_get_info(uint32_t id) +{ + return vstreamContainer[id]; +} + +static void vstream_free_info(uint32_t id) +{ + struct vstream_info *freeInfo = vstream_get_info(id); + + ucc_set_vstream_state(freeInfo, SE_DEAD); + + if (freeInfo) { + if (freeInfo->vsqNode) + kfree(freeInfo->vsqNode->vstreamData); + + kfree(freeInfo->vsqNode); + } + + kfree(freeInfo); +} + +static int queue_pop_by_num(struct vstream_node *node, uint32_t pop_num) +{ + if (node->credit + pop_num > MAX_VSTREAM_SIZE) { + ucc_err("Queue usage out-of-bounds"); + return -EACCES; + } + + node->credit += pop_num; + node->head = (node->head + pop_num) % MAX_VSTREAM_SIZE; + return 0; +} + +static int queue_pop_by_head(struct vstream_node *node, uint32_t head) +{ + int pop_num = (head - node->head + MAX_VSTREAM_SIZE) % + MAX_VSTREAM_SIZE; + return queue_pop_by_num(node, pop_num); +} + +int update_vstream_head(struct vstream_info *vstream_info, int num) +{ + struct vstream_node *node = vstream_info->vsqNode; + + raw_spin_lock(&node->spin_lock); + if (node->credit + num > MAX_VSTREAM_SIZE) { + raw_spin_unlock(&node->spin_lock); + return -1; + } + + node->credit += num; + node->head = (node->head + num) % MAX_VSTREAM_SIZE; + raw_spin_unlock(&node->spin_lock); + + return 0; +} + +bool vstream_have_kernel(struct ucc_se *se) +{ + struct vstream_info *vinfo; + + vinfo = container_of(se, struct vstream_info, se); + return vinfo->vsqNode->credit != MAX_VSTREAM_SIZE; +} + +static int queue_push_by_num(struct vstream_node *node, uint32_t push_num) +{ + if (node->credit - push_num < 0) + return -EACCES; + + node->credit -= push_num; + node->tail = (node->tail + push_num) % MAX_VSTREAM_SIZE; + return 0; +} + +static int queue_push_by_tail(struct vstream_node *node, uint32_t tail) +{ + int push_num = (tail - node->tail + MAX_VSTREAM_SIZE) % + MAX_VSTREAM_SIZE; + return queue_push_by_num(node, push_num); +} + +static uint32_t vstream_alloc_vstreamId(void) +{ + uint32_t vstreamId = DEVDRV_MAX_SQ_NUM; + + /* alloc vstreamId */ + mutex_lock(&vstreamId_Bitmap_mutex); + vstreamId = find_first_zero_bit(vstreamIdBitmap, DEVDRV_MAX_SQ_NUM); + if (vstreamId == DEVDRV_MAX_SQ_NUM) { + ucc_err("vstreamId exhausted.\n"); + mutex_unlock(&vstreamId_Bitmap_mutex); + return DEVDRV_MAX_SQ_NUM; + } + set_bit(vstreamId, vstreamIdBitmap); + mutex_unlock(&vstreamId_Bitmap_mutex); + + return vstreamId; +} + +static uint32_t vstream_alloc_vcqid(void) +{ + uint32_t vcqId = DEVDRV_MAX_CQ_NUM; + + /* alloc vcqid */ + mutex_lock(&vcqId_Bitmap_mutex); + vcqId = find_first_zero_bit(vcqIdBitmap, DEVDRV_MAX_CQ_NUM); + if (vcqId == DEVDRV_MAX_CQ_NUM) { + ucc_err("vcqId has been used up.\n"); + mutex_unlock(&vcqId_Bitmap_mutex); + return DEVDRV_MAX_CQ_NUM; + } + set_bit(vcqId, vcqIdBitmap); + mutex_unlock(&vcqId_Bitmap_mutex); + + ucc_info("vcqId = %d\n", vcqId); + return vcqId; +} + +int vstream_map_pfnaddr(struct tsdrv_ctx *ctx, + struct normal_alloc_sqcq_para *para) +{ + int err = 0; + unsigned long vsqAddr; + unsigned long vcqAddr; + pgprot_t vm_page_prot; + struct vm_area_struct *vma = ctx->vma[para->tsId]; + + vsqAddr = vma->vm_start + SQ_USER_ADDR_OFFSET(para->sqId); + vm_page_prot = pgprot_device(vma->vm_page_prot); + err = remap_pfn_range(vma, vsqAddr, + virt_to_pfn(SQ_VSTREAM_DATA(para->sqId)), + MAX_SQ_SIZE, vm_page_prot); + if (err) { + ucc_err("remap_pfn_range failed,ret=%d.\n", err); + return -EFAULT; + } + if (!(para->flag & TSDRV_CQ_REUSE)) { + vcqAddr = vma->vm_start + DEVDRV_VM_CQ_MEM_OFFSET + + CQ_USER_ADDR_OFFSET(para->cqId); + err = remap_pfn_range(vma, vcqAddr, + virt_to_pfn(CQ_VSTREAM_DATA(para->sqId)), + MAX_CQ_SIZE, vm_page_prot); + if (err) { + ucc_err("remap_pfn_range failed,ret=%d.\n", err); + return -EFAULT; + } + } + + return err; +} + +void vstream_unmap_pfnaddr(struct tsdrv_ctx *ctx, + struct normal_free_sqcq_para *para) +{ + unsigned long vsqAddr; + unsigned long vcqAddr; + size_t cqSize = PAGE_ALIGN(MAX_CQ_SIZE); + struct vm_area_struct *vma = ctx->vma[para->tsId]; + + vsqAddr = vma->vm_start + SQ_USER_ADDR_OFFSET(para->sqId); + zap_vma_ptes(vma, vsqAddr, MAX_SQ_SIZE); + + if (!(para->flag & TSDRV_CQ_REUSE)) { + vcqAddr = vma->vm_start + DEVDRV_VM_CQ_MEM_OFFSET + + CQ_USER_ADDR_OFFSET(para->cqId); + zap_vma_ptes(vma, vcqAddr, cqSize); + } +} + +static int vstream_update_vcqtable(uint32_t vcqId, uint32_t vstreamId, + uint32_t flag) +{ + int err = -ENOSPC; + struct vcq_map_table *vcqTable = NULL; + struct vstream_id *vstreamIdNode = NULL; + + if (!(flag & TSDRV_CQ_REUSE)) { + vcqTable = kmalloc(sizeof(struct vcq_map_table), GFP_KERNEL); + if (!vcqTable) + return -ENOMEM; + + vcqTable->vcqId = vcqId; + vcqTable->vcqNode = kmalloc(sizeof(struct vstream_node), + GFP_KERNEL); + if (!vcqTable->vcqNode) { + err = -ENOMEM; + goto free_vcqTable; + } + + vcqTable->vcqNode->vstreamData = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (!vcqTable->vcqNode->vstreamData) { + err = -ENOMEM; + goto free_vcqNode; + } + vcqTable->vcqNode->id = vcqId; + vcqTable->vcqNode->head = 0; + vcqTable->vcqNode->tail = 0; + vcqTable->vcqNode->credit = MAX_VSTREAM_SIZE; + INIT_LIST_HEAD(&vcqTable->vstreamId_list); + vsqcqMapTable[vcqId] = vcqTable; + } else { + vcqTable = vsqcqMapTable[vcqId]; + } + vstreamIdNode = kmalloc(sizeof(struct vstream_id), GFP_KERNEL); + if (!vstreamIdNode) { + err = -ENOMEM; + + if (!(flag & TSDRV_CQ_REUSE)) + goto free_vstreamData; + return err; + } + vstreamIdNode->vstreamId = vstreamId; + list_add(&vstreamIdNode->list, &vcqTable->vstreamId_list); + + return 0; + +free_vstreamData: + kfree(vcqTable->vcqNode->vstreamData); + +free_vcqNode: + kfree(vcqTable->vcqNode); + +free_vcqTable: + kfree(vcqTable); + + return err; +} + +int ascend_vstream_alloc(struct vstream_args *arg) +{ + uint32_t vstreamId; + uint32_t vcqId = DEVDRV_MAX_CQ_NUM; + int err = -EINVAL; + struct vstream_info *vstream = NULL; + struct tsdrv_ctx *ctx = NULL; + struct normal_alloc_sqcq_para *sqcq_alloc_para = &arg->va_args.ascend; + + ctx = get_ctx(sqcq_alloc_para->fd); + if (!ctx) + return err; + + vstreamId = vstream_alloc_vstreamId(); + if (vstreamId == DEVDRV_MAX_SQ_NUM) { + ucc_err("vstreamId alloc failed.\n"); + return err; + } + if (!(sqcq_alloc_para->flag & TSDRV_CQ_REUSE)) + vcqId = vstream_alloc_vcqid(); + else + vcqId = sqcq_alloc_para->cqId; + + if (vcqId >= DEVDRV_MAX_CQ_NUM) { + ucc_err("vcqId alloc failed.\n"); + goto free_vstreamIds; + } + err = vstream_update_vcqtable(vcqId, vstreamId, sqcq_alloc_para->flag); + if (err) { + ucc_err("vcqtable update failed, vcqId:%d, vstreamId:%d, flag:%d.\n", + vcqId, vstreamId, sqcq_alloc_para->flag); + goto free_vcqid; + } + + sqcq_alloc_para->sqId = vstreamId; + sqcq_alloc_para->cqId = vcqId; + vstream = vstream_create_info(ctx, sqcq_alloc_para); + if (!vstream) { + ucc_err("vstream create failed: vcqId:%d, vstreamId:%d.\n", + vcqId, vstreamId); + err = -ENOSPC; + goto free_vcqtable; + } + + vstream->devId = sqcq_alloc_para->devId; + vstreamContainer[vstreamId] = vstream; + + vstream->group = select_sq(vstream); + if (!vstream->group) { + ucc_err("Failed to select sq\n"); + err = -EINVAL; + goto free_vstream_info; + } + + err = vstream_map_pfnaddr(ctx, sqcq_alloc_para); + if (err) { + ucc_err("vstream map failed, ret=%d.\n", err); + goto free_vstream_info; + } + return 0; + +free_vstream_info: + vstream_free_info(vstreamId); + +free_vcqtable: + vstream_free_map_table(vcqId, vstreamId, sqcq_alloc_para->flag); + +free_vcqid: + free_vcqId(vcqId, sqcq_alloc_para->flag); + +free_vstreamIds: + free_vstreamId(vstreamId); + + return err; +} + +int ascend_vstream_free(struct vstream_args *arg) +{ + int err = 0; + struct vstream_info *vstreamInfo = NULL; + struct normal_free_sqcq_para *sqcq_free_para = &arg->vf_args.ascend; + uint32_t vstreamId = sqcq_free_para->sqId; + uint32_t vcqId = sqcq_free_para->cqId; + + if (vstreamId >= DEVDRV_MAX_SQ_NUM || vcqId >= DEVDRV_MAX_CQ_NUM) { + ucc_err("vstream index out-of-range, vstreamId=%d, vcqId=%d.\n", + vstreamId, vcqId); + return -EPERM; + } + + vstreamInfo = vstream_get_info(vstreamId); + if (!vstreamInfo) { + ucc_err("vstreamInfo get failed, vstreamId=%d.\n", vstreamId); + return -EPERM; + } + err = ucc_free_task(vstreamInfo, vstreamInfo->privdata); + + free_vcqId(vcqId, sqcq_free_para->flag); + vstream_free_map_table(vcqId, vstreamId, sqcq_free_para->flag); + + vstream_unmap_pfnaddr(vstreamInfo->privdata, sqcq_free_para); + + vstream_free_info(vstreamId); + free_vstreamId(vstreamId); + return err; +} + +int ascend_vstream_kick(struct vstream_args *arg) +{ + int err = 0; + struct tsdrv_sqcq_data_para *sqcq_data_para = &arg->vk_args.ascend; + int vstreamId = sqcq_data_para->id; + int tail = sqcq_data_para->val; + struct vstream_info *vstreamInfo = NULL; + int push_num; + + vstreamInfo = vstream_get_info(vstreamId); + vstreamInfo->p = current; + + if (!vstreamInfo) { + ucc_err("vstreamInfo get failed, vstreamId=%d.\n", vstreamId); + return -ENOMEM; + } + + push_num = (tail - vstreamInfo->vsqNode->tail + MAX_VSTREAM_SIZE) % + MAX_VSTREAM_SIZE; + + raw_spin_lock(&vstreamInfo->vsqNode->spin_lock); + err = queue_push_by_tail(vstreamInfo->vsqNode, tail); + if (err) { + raw_spin_unlock(&vstreamInfo->vsqNode->spin_lock); + ucc_err("queue_push_by_tail error, ret = %d\n", err); + return err; + } + raw_spin_unlock(&vstreamInfo->vsqNode->spin_lock); + + err = ucc_wake_up(&vstreamInfo->se); + return err; +} + +int ascend_callback_vstream_wait(struct vstream_args *arg) +{ + int err = 0; + int cqeNum = 0; + int cqeSum = 0; + struct vstream_info *vstreamInfo = NULL; + struct vcq_map_table *vcqTable = NULL; + struct vcq_map_table *waitTable = NULL; + struct vstream_id *vstreamIdNode = NULL; + struct devdrv_report_para *report_para = &arg->cvw_args; + uint32_t *sqlist; + uint32_t sqlist_num = 0; + uint32_t vstreamId, vcqId; + + sqlist = kmalloc_array(DEVDRV_MAX_SQ_NUM, sizeof(uint32_t), GFP_KERNEL); + if (!sqlist) + return -ENOMEM; + + vcqId = report_para->cq_id; + if (vcqId >= DEVDRV_MAX_CQ_NUM) { + ucc_err("vcqId out-of-range, vcqId=%d.\n", vcqId); + err = -EPERM; + goto out; + } + + mutex_lock(&vcqId_Bitmap_mutex); + waitTable = vstream_get_map_table(vcqId); + if (!waitTable) { + ucc_err("No map found for vcq:%d.\n", vcqId); + mutex_unlock(&vcqId_Bitmap_mutex); + err = -EPERM; + goto out; + } + + list_for_each_entry(vstreamIdNode, &waitTable->vstreamId_list, list) + sqlist[sqlist_num++] = vstreamIdNode->vstreamId; + mutex_unlock(&vcqId_Bitmap_mutex); + + //get sqInfo from hardware + for (vstreamId = 0; vstreamId < sqlist_num; vstreamId++) { + vstreamInfo = vstream_get_info(sqlist[vstreamId]); + if (!vstreamInfo) + continue; + err |= ucc_wait_cq(vstreamInfo, vstreamInfo->privdata, + report_para, &cqeNum); + cqeSum += cqeNum; + if (cqeNum) + break; + } + + //update cqInfo + mutex_lock(&vcqId_Bitmap_mutex); + vcqTable = vstream_get_map_table(vcqId); + if (!vcqTable) { + ucc_err("No map found for vcq:%d.\n", vcqId); + err = -EPERM; + goto out; + } + + err = queue_push_by_num(vcqTable->vcqNode, cqeSum); + if (err) { + mutex_unlock(&vcqId_Bitmap_mutex); + ucc_err("failed to queue_push_by_num, ret = %d.\n", err); + goto out; + } + report_para->cq_tail = vcqTable->vcqNode->tail; + mutex_unlock(&vcqId_Bitmap_mutex); + +out: + kfree(sqlist); + return err; +} + +int ascend_callback_vstream_kick(struct vstream_args *arg) +{ + u32 vcqId, release_head; + struct vstream_info *vstreamInfo = NULL; + int err = 0; + + vcqId = arg->cvk_args.id; + release_head = arg->cvk_args.val; + if (vcqId >= DEVDRV_MAX_CQ_NUM || release_head >= MAX_VSTREAM_SIZE) { + ucc_err("vstream index out-of-range, vcqId=%d, release_head=%d.\n", + vcqId, release_head); + return -EPERM; + } + + mutex_lock(&vcqId_Bitmap_mutex); + vstreamInfo = vstream_get_info(vcqId); + if (!vstreamInfo) { + err = -EPERM; + goto out; + } + + err = queue_pop_by_head(vstreamInfo->vcqNode, release_head); + +out: + mutex_unlock(&vcqId_Bitmap_mutex); + return err; +} + +int ascend_vstream_get_head(struct vstream_args *arg) +{ + u32 vstreamId = arg->vh_args.id; + struct vstream_info *vstreamInfo = NULL; + + if (vstreamId >= DEVDRV_MAX_SQ_NUM) { + ucc_err("vstreamId out-of-range, vstreamId=%d.\n", vstreamId); + return -EINVAL; + } + + vstreamInfo = vstream_get_info(vstreamId); + if (!vstreamInfo) { + ucc_err("vstreamInfo get failed, vstreamId=%d.\n", vstreamId); + return -EINVAL; + } + arg->vh_args.val = vstreamInfo->vsqNode->head; + + return 0; +} + diff --git a/kernel/ucc/ascend_vstream.h b/kernel/ucc/ascend_vstream.h new file mode 100644 index 000000000000..0cd200168495 --- /dev/null +++ b/kernel/ucc/ascend_vstream.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#ifndef _ASCEND_VSTREAM_H +#define _ASCEND_VSTREAM_H + +int ascend_vstream_alloc(struct vstream_args *arg); +int ascend_vstream_free(struct vstream_args *arg); +int ascend_vstream_kick(struct vstream_args *arg); +int ascend_callback_vstream_wait(struct vstream_args *arg); +int ascend_callback_vstream_kick(struct vstream_args *arg); +int ascend_vstream_get_head(struct vstream_args *arg); + +#endif /* _ASCEND_VSTREAM_H */ diff --git a/kernel/ucc/vstream.c b/kernel/ucc/vstream.c new file mode 100644 index 000000000000..d4705f285b89 --- /dev/null +++ b/kernel/ucc/vstream.c @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/syscalls.h> +#include <linux/vstream.h> + +#include "ascend_vstream.h" + +static int amdgpu_vstream_alloc(struct vstream_args *arg) +{ + return 0; +} +static int amdgpu_vstream_free(struct vstream_args *arg) +{ + return 0; +} +static int amdgpu_vstream_kick(struct vstream_args *arg) +{ + return 0; +} +static int amdgpu_vstream_update(struct vstream_args *arg) +{ + return 0; +} + +/* + * vstream_manage_cmd table + */ +static vstream_manage_t (*vstream_command_table[AMDGPU_MAX_COMMAND + 1]) = { + ascend_vstream_alloc, // ASCEND_VSTREAM_ALLOC + ascend_vstream_free, // ASCEND_VSTREAM_FREE + ascend_vstream_kick, // ASCEND_VSTREAM_KICK + ascend_callback_vstream_wait, // ASCEND_CALLBACK_VSTREAM_WAIT + ascend_callback_vstream_kick, // ASCEND_CALLBACK_VSTREAM_KICK + ascend_vstream_get_head, // ASCEND_VSTREAM_GET_HEAD + NULL, // ASCEND_MAX_COMMAND + amdgpu_vstream_alloc, // AMDGPU_VSTREAM_ALLOC + amdgpu_vstream_free, // AMDGPU_VSTREAM_FREE + amdgpu_vstream_kick, // AMDGPU_VSTREAM_KICK + amdgpu_vstream_update, // AMDGPU_VSTREAM_UPDATE + NULL // AMDGPU_MAX_COMMAND +}; + +SYSCALL_DEFINE2(vstream_manage, struct vstream_args __user *, arg, int, cmd) +{ + int res = 0; + struct vstream_args vstream_arg; + + if (cmd > AMDGPU_MAX_COMMAND) + return -EINVAL; + + if (copy_from_user(&vstream_arg, arg, sizeof(struct vstream_args))) { + pr_err("copy_from_user failed\n"); + return -EFAULT; + } + res = vstream_command_table[cmd](&vstream_arg); + if (copy_to_user(arg, &vstream_arg, sizeof(struct vstream_args))) { + pr_err("copy_to_user failed\n"); + return -EFAULT; + } + + return res; +} diff --git a/kernel/ucc_sched/Makefile b/kernel/ucc_sched/Makefile new file mode 100644 index 000000000000..4a41f07d091c --- /dev/null +++ b/kernel/ucc_sched/Makefile @@ -0,0 +1 @@ +obj-(CONFIG_XPU_SCHEDULE) += core.o diff --git a/kernel/ucc_sched/core.c b/kernel/ucc_sched/core.c new file mode 100644 index 000000000000..4c7f1f59aeb9 --- /dev/null +++ b/kernel/ucc_sched/core.c @@ -0,0 +1,591 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved. + * Author: Huawei OS Kernel Lab + * Create: Tue Jan 17 22:19:17 2023 + */ + +#include <uapi/linux/sched/types.h> +#include <linux/kthread.h> +#include <linux/slab.h> +#include <linux/ucc_sched.h> + +#include "ucc_sched.h" +#include "../sched/sched.h" +#define CREATE_TRACE_POINTS +#include <trace/events/ucc_sched.h> + +#define MAX_XCU_NUM (100) +#define TS_SQ_TRANS_TASK_THRESHOLD (20) + +static struct xcu xcu_manager[MAX_XCU_NUM]; +static int num_active_xcu; +raw_spinlock_t xcu_mgr_lock; +int sysctl_ucc_sched_rcv_timeout_ms = 10; + +static struct task_struct vstream_idle_task; +static struct vstream_info vstream_idle = { + .vstreamId = UINT_MAX, + .p = &vstream_idle_task, +}; + +struct sched_args { + int cu_id; +}; + +static inline int is_xcu_offline(struct xcu *cu) +{ + return cu->state == XCU_INACTIVE; +} + +void ucc_set_vstream_state(struct vstream_info *vinfo, int state) +{ + vinfo->se.state = state; +} + +static inline int should_se_run(struct ucc_se *se) +{ + return se->state != SE_BLOCK && se->state != SE_DEAD; +} + +static inline void update_stats_run_start(struct xcu *cu, + struct ucc_se *se) +{ + u64 start; + + if (!schedstat_enabled()) + return; + + start = ktime_get_boot_ns(); + __schedstat_set(se->statistics.run_start, start); +} + +static inline void update_stats_run_end(struct xcu *cu, + struct ucc_se *se) +{ + + struct vstream_info *vinfo; + u64 delta; + + if (!schedstat_enabled()) + return; + + delta = ktime_get_boot_ns() - schedstat_val(se->statistics.run_start); + vinfo = container_of(se, struct vstream_info, se); + trace_ucc_sched_stat_run(vinfo, delta, se->is_timeout); + + __schedstat_set(se->statistics.run_max, + max(schedstat_val(se->statistics.run_max), delta)); + __schedstat_inc(se->statistics.run_count); + __schedstat_add(se->statistics.run_sum, delta); + __schedstat_set(se->statistics.run_start, 0); +} + +static inline void update_stats_preempt_start(struct xcu *cu, + struct ucc_se *se) +{ + u64 wait_start; + + if (!schedstat_enabled()) + return; + + wait_start = ktime_get_boot_ns(); + __schedstat_set(se->statistics.preempt_start, wait_start); +} + +static inline void update_stats_wait_start(struct xcu *cu, struct ucc_se *se) +{ + u64 wait_start; + + if (!schedstat_enabled()) + return; + + wait_start = ktime_get_boot_ns(); + __schedstat_set(se->statistics.wait_start, wait_start); +} + + +static inline void update_stats_wait_end(struct xcu *cu, struct ucc_se *se) +{ + struct vstream_info *vinfo; + u64 delta, preempt_delta; + + if (!schedstat_enabled()) + return; + + delta = ktime_get_boot_ns() - schedstat_val(se->statistics.wait_start); + vinfo = container_of(se, struct vstream_info, se); + trace_ucc_sched_stat_wait(vinfo, delta); + + __schedstat_set(se->statistics.wait_max, + max(schedstat_val(se->statistics.wait_max), delta)); + __schedstat_inc(se->statistics.wait_count); + __schedstat_add(se->statistics.wait_sum, delta); + __schedstat_set(se->statistics.wait_start, 0); + + if (se->statistics.preempt_start) { + preempt_delta = ktime_get_boot_ns() - + schedstat_val(se->statistics.preempt_start); + trace_ucc_sched_stat_preempt(vinfo, preempt_delta); + + __schedstat_set(se->statistics.preempt_max, + max(schedstat_val(se->statistics.preempt_max), + preempt_delta)); + __schedstat_inc(se->statistics.preempt_count); + __schedstat_add(se->statistics.preempt_sum, preempt_delta); + __schedstat_set(se->statistics.preempt_start, 0); + } +} + +void ucc_dump_statistics_info(struct ucc_se *se) +{ + struct vstream_info *vinfo = container_of(se, struct vstream_info, se); + + pr_info("comm %s pid %d vstreamId %d kernel_sum %llu wait_count %llu wait_max %llu[ns] wait_sum %llu[ns] preempt_count %llu preempt_max %llu[ns] preempt_sum %llu[ns]\n", + vinfo->p->comm, + vinfo->p->pid, + vinfo->vstreamId, + vinfo->se.statistics.kernel_sum, + vinfo->se.statistics.wait_count, + vinfo->se.statistics.wait_max, + vinfo->se.statistics.wait_sum, + vinfo->se.statistics.preempt_count, + vinfo->se.statistics.preempt_max, + vinfo->se.statistics.preempt_sum); +} + +static void put_prev_entity(struct xcu *cu, struct ucc_se *prev) +{ + if (!prev) + return; + + if (prev->on_cu) + update_stats_wait_start(cu, prev); + + prev->state = SE_READY; + cu->curr_se->state = SE_RUNNING; +} + +static void set_next_entity(struct xcu *cu, struct ucc_se *se) +{ + if (se->on_cu && se != cu->curr_se) + update_stats_wait_end(cu, se); + + cu->curr_se = se; +} + +static void dequeue_ucc_se(struct ucc_se *se, struct xcu *cu) +{ + raw_spin_lock(&cu->xcu_lock); + if (!se->on_cu) { + raw_spin_unlock(&cu->xcu_lock); + return; + } + + se->on_cu = 0; + + list_del_init(&se->run_list); + + if (list_empty(cu->queue + se->prio)) + __clear_bit(se->prio, cu->bitmap); + cu->rt_nr_running--; + + if (se != cu->curr_se) + update_stats_wait_end(cu, se); + + if (cu->curr_se == se) + cu->curr_se = NULL; + + raw_spin_unlock(&cu->xcu_lock); +} + +static void enqueue_ucc_se(struct ucc_se *se, struct xcu *cu) +{ + struct list_head *queue = cu->queue + se->prio; + + raw_spin_lock(&cu->xcu_lock); + if (se->on_cu) { + raw_spin_unlock(&cu->xcu_lock); + return; + } + se->on_cu = 1; + se->is_timeout = 0; + list_add_tail(&se->run_list, queue); + __set_bit(se->prio, cu->bitmap); + cu->rt_nr_running++; + + update_stats_wait_start(cu, se); + + raw_spin_unlock(&cu->xcu_lock); +} + +static struct xcu *ucc_select_cu(struct ucc_se *se) +{ + struct vstream_info *vstream_info; + int min_nr_running = INT_MAX; + struct xcu *cu; + int select_cu = 0; + int cu_id; + + vstream_info = container_of(se, struct vstream_info, se); + for (cu_id = 0; cu_id < num_active_xcu; cu_id++) { + cu = &xcu_manager[cu_id]; + + if (vstream_info->devId != cu->dev_id || + vstream_info->tsId != cu->ts_id) + continue; + + if (cu->rt_nr_running < min_nr_running) { + min_nr_running = cu->rt_nr_running; + select_cu = cu_id; + } + } + + vstream_info->cu_id = select_cu; + return &xcu_manager[select_cu]; +} + +static int ucc_check_preempt(struct ucc_se *se, struct xcu *cu) +{ + struct vstream_info *vinfo_curr, *vinfo; + struct ucc_se *curr_se; + + curr_se = cu->curr_se; + if (!curr_se) + return 1; + + vinfo = container_of(se, struct vstream_info, se); + vinfo_curr = container_of(curr_se, struct vstream_info, se); + if (vinfo_curr->p->ucc_priority > vinfo->p->ucc_priority) { + update_stats_preempt_start(cu, se); + curr_se->flag = UCC_TIF_PREEMPT; + return 1; + } + + return 0; +} + +static inline void ucc_wakeup_idle_worker(struct xcu *cu) +{ + wake_up_state(cu->worker, TASK_INTERRUPTIBLE); +} + +static inline void ucc_wakeup_running_worker(struct xcu *cu) +{ + wake_up_state(cu->worker, TASK_UNINTERRUPTIBLE); +} + +int ucc_schedule(int cu_id) +{ + struct xcu *cu; + + cu = &xcu_manager[cu_id]; + cu->is_wake = 1; + ucc_wakeup_running_worker(cu); + + return 0; +} +EXPORT_SYMBOL(ucc_schedule); + +int ucc_wake_up(struct ucc_se *se) +{ + struct xcu *cu; + + raw_spin_lock(&se->se_lock); + if (se->on_cu) { + raw_spin_unlock(&se->se_lock); + return 0; + } + + if (se->state == SE_BLOCK) + se->state = SE_READY; + + cu = ucc_select_cu(se); + if (!cu) { + raw_spin_unlock(&se->se_lock); + return -1; + } + + enqueue_ucc_se(se, cu); + if (ucc_check_preempt(se, cu)) + ucc_wakeup_idle_worker(cu); + + raw_spin_unlock(&se->se_lock); + + return 0; +} + +static struct ucc_se *pick_next_ucc_se(struct xcu *cu) +{ + struct ucc_se *se; + struct list_head *queue; + int idx; + + if (!cu->rt_nr_running) + return NULL; + + idx = sched_find_first_bit(cu->bitmap); + BUG_ON(idx >= MAX_UCC_PRIO); + + queue = cu->queue + idx; + se = list_entry(queue->next, struct ucc_se, run_list); + + return se; +} + +static int ucc_submit_kernel(struct xcu *cu, struct ucc_se *se) +{ + struct vstream_info *vstream_info; + struct xpu_group *group; + struct tsdrv_ctx *ctx; + int kernel_num, left; + + vstream_info = container_of(se, struct vstream_info, se); + ctx = vstream_info->privdata; + left = (vstream_info->vsqNode->tail - vstream_info->vsqNode->head + + MAX_VSTREAM_SIZE) % MAX_VSTREAM_SIZE; + + group = vstream_info->group; + + kernel_num = xpu_run(group, vstream_info, ctx); + if (kernel_num <= 0) + return kernel_num; + + //update vstream info head and tail; + update_vstream_head(vstream_info, kernel_num); + + left -= kernel_num; + + return kernel_num; +} + +static inline void ucc_wait_idle(struct xcu *cu) +{ + cu->state = XCU_IDLE; + + do { + schedule_timeout_interruptible(1); + } while (cu->rt_nr_running == 0); + + cu->state = XCU_BUSY; +} + +static inline void ucc_wait_running(struct xcu *cu, struct ucc_se *se) +{ + int cnt = 1; + + do { + schedule_timeout_uninterruptible( + msecs_to_jiffies(sysctl_ucc_sched_rcv_timeout_ms)); + } while (cu->is_wake == 0 && --cnt > 0); + + if (cnt == 0) { + __schedstat_inc(se->statistics.timeout_count); + se->is_timeout = 1; + } +} + +static inline void clear_se_flag(struct ucc_se *se) +{ + if (se) + se->flag = UCC_TIF_NONE; +} + +void ucc_dequeue_task(struct vstream_info *vInfo) +{ + struct xcu *cu = &xcu_manager[vInfo->cu_id]; + struct ucc_se *se = &vInfo->se; + + raw_spin_lock(&se->se_lock); + dequeue_ucc_se(se, cu); + raw_spin_unlock(&se->se_lock); +} + +/* + * dynamic padding: select kernels with no QoS confilcts to current ucc_se + * to fill cu; + */ +static void dynamic_padding(struct xcu *cu, struct ucc_se *se) +{ +} + +static int __ucc_schedule(void *args) +{ + struct sched_args *sargs = (struct sched_args *)args; + int cu_id = sargs->cu_id; + struct xcu *cu = &xcu_manager[cu_id]; + struct ucc_se *se = NULL, *curr_se = NULL; + struct ucc_se *prev_se = NULL; + struct vstream_info *vinfo; + int send_cnt = 0; + int kernel_num, preempt; + + while (!is_xcu_offline(cu)) { + raw_spin_lock(&cu->xcu_lock); + cu->is_sched = 0; + prev_se = cu->curr_se; + + preempt = 0; + if (prev_se) { + if (prev_se->flag != UCC_TIF_PREEMPT) + goto submit_kernel; + + vinfo = container_of(prev_se, struct vstream_info, se); + if (send_cnt < vinfo->p->ucc_step) + goto submit_kernel; + + preempt = 1; + } + + clear_se_flag(prev_se); + se = pick_next_ucc_se(cu); + if (!se) { + cu->is_sched = 1; + raw_spin_unlock(&cu->xcu_lock); + trace_ucc_sched_switch(0, &vstream_idle); + ucc_wait_idle(cu); + continue; + } + + set_next_entity(cu, se); + if (se != prev_se) { + put_prev_entity(cu, prev_se); + vinfo = container_of(se, struct vstream_info, se); + trace_ucc_sched_switch(preempt, vinfo); + } + send_cnt = 0; +submit_kernel: + curr_se = cu->curr_se; + dynamic_padding(cu, curr_se); + raw_spin_unlock(&cu->xcu_lock); + + curr_se->is_timeout = 0; + kernel_num = ucc_submit_kernel(cu, curr_se); + //has no more kernels to submit. + if (kernel_num <= 0 && !vstream_have_kernel(curr_se)) { + raw_spin_lock(&curr_se->se_lock); + curr_se->state = SE_BLOCK; + dequeue_ucc_se(curr_se, cu); + raw_spin_unlock(&curr_se->se_lock); + cu->is_sched = 1; + continue; + } + cu->is_sched = 1; + + vinfo = container_of(curr_se, struct vstream_info, se); + if (vinfo->send_cnt > TS_SQ_TRANS_TASK_THRESHOLD) { + update_stats_run_start(cu, curr_se); + /* kernel has not finish */ + if (!cu->is_wake) + ucc_wait_running(cu, curr_se); + + update_stats_run_end(cu, curr_se); + cu->is_wake = 0; + vinfo->send_cnt = 0; + } + + send_cnt += kernel_num; + schedstat_add(se->statistics.kernel_sum, kernel_num); + } + + return 0; +} + +static void init_xcu_rq(struct xcu *cu) +{ + int i; + + for (i = 0; i < MAX_UCC_PRIO; i++) { + INIT_LIST_HEAD(cu->queue + i); + __clear_bit(i, cu->bitmap); + } + + /* delimiter for bitsearch: */ + __set_bit(MAX_UCC_PRIO, cu->bitmap); + cu->rt_nr_running = 0; + raw_spin_lock_init(&cu->xcu_lock); +} + +static int alloc_cu_id(void) +{ + int cu_id = -1; + + raw_spin_lock(&xcu_mgr_lock); + if (num_active_xcu >= MAX_XCU_NUM) { + raw_spin_unlock(&xcu_mgr_lock); + return cu_id; + } + + cu_id = num_active_xcu; + num_active_xcu++; + raw_spin_unlock(&xcu_mgr_lock); + + return cu_id; +} + +int ucc_sched_register_xcu(int dev_id, int ts_id, int cu_num) +{ + int cu_id; + struct xcu *cu; + struct sched_args *args; + struct sched_param param = { .sched_priority = 1 }; + char id_buf[16]; + int i; + + for (i = 0; i < cu_num; i++) { + cu_id = alloc_cu_id(); + if (cu_id < 0) { + pr_err("alloc cu id failed\n"); + return -1; + } + + cu = &xcu_manager[cu_id]; + cu->cu_id = cu_id; + cu->state = XCU_IDLE; + cu->curr_se = NULL; + cu->dev_id = dev_id; + cu->ts_id = ts_id; + cu->is_wake = 0; + init_xcu_rq(cu); + + args = kzalloc(sizeof(struct sched_args), GFP_KERNEL); + if (!args) + return -1; + + args->cu_id = cu->cu_id; + snprintf(id_buf, sizeof(id_buf), "%d:%d:%d", + cu->cu_id, cu->dev_id, cu->ts_id); + cu->worker = kthread_create_on_node(__ucc_schedule, + (void *)args, NUMA_NO_NODE, + "u_sched/%s", id_buf); + sched_setscheduler_nocheck(cu->worker, SCHED_FIFO, &param); + wake_up_process(cu->worker); + } + + return 0; +} +EXPORT_SYMBOL(ucc_sched_register_xcu); + +int ucc_sched_init(void) +{ + raw_spin_lock_init(&xcu_mgr_lock); + return 0; +} + +int ucc_rt_nr_running(struct xcu *cu) +{ + return cu->rt_nr_running; +} +EXPORT_SYMBOL(ucc_rt_nr_running); + +struct xcu *ucc_get_xcu_by_id(int cu_id) +{ + return &xcu_manager[cu_id]; +} +EXPORT_SYMBOL(ucc_get_xcu_by_id); + +int ucc_xcu_is_sched(int cu_id) +{ + return xcu_manager[cu_id].is_sched; +} +EXPORT_SYMBOL(ucc_xcu_is_sched); diff --git a/kernel/ucc_sched/ucc_sched.h b/kernel/ucc_sched/ucc_sched.h new file mode 100644 index 000000000000..30e2aa10cf2f --- /dev/null +++ b/kernel/ucc_sched/ucc_sched.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved. + * Author: Huawei OS Kernel Lab + * Create: Tue Jan 17 22:27:22 2023 + */ +#ifndef __UCC_SCHED_USCHED_H__ +#define __UCC_SCHED_USCHED_H__ + +#include <linux/sched.h> +#include <linux/spinlock_types.h> +#include <linux/types.h> +#include <linux/vstream.h> + +//For simplicity, we set this parameter to 2. +#define MAX_UCC_PRIO (2) + +enum xcu_state { + XCU_INACTIVE, + XCU_IDLE, + XCU_BUSY, + XCU_SUBMIT, +}; + +/* + * This is the abstraction object of the xpu computing unit. + */ +struct xcu { + int is_sched; + int cu_id; + int dev_id; + int ts_id; + int rt_nr_running; + int is_wake; + struct task_struct *worker; + DECLARE_BITMAP(bitmap, MAX_UCC_PRIO); + struct list_head queue[MAX_UCC_PRIO]; + enum xcu_state state; + struct ucc_se *curr_se; + raw_spinlock_t xcu_lock; +}; + +#endif -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] ucc: add ucc support
by Jinjie Ruan 13 Sep '23

13 Sep '23

From: Guan Jing <guanjing6(a)huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I80YXE CVE: NA ---------------------------------------- ucc support for XPU. Signed-off-by: Chen Hui <judy.chenhui(a)huawei.com> Signed-off-by: Yang Yanchao <yangyanchao6(a)huawei.com> Signed-off-by: Hui Tang <tanghui20(a)huawei.com> Signed-off-by: Guan Jing <guanjing6(a)huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> --- Kconfig | 2 + drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/xpu/Kconfig | 9 + drivers/xpu/Makefile | 1 + drivers/xpu/xpu_group.c | 175 ++++++++ fs/proc/base.c | 102 ++++- include/linux/sched.h | 3 + include/linux/ucc_common.h | 21 + include/linux/ucc_kfd.h | 110 +++++ include/linux/ucc_sched.h | 36 ++ include/linux/ucc_sched/ucc_sched.h | 71 +++ include/linux/ucc_ts.h | 254 +++++++++++ include/linux/vstream.h | 123 ++++++ include/linux/xpu_group.h | 66 +++ include/trace/events/ucc_sched.h | 120 +++++ init/init_task.c | 4 + init/main.c | 9 + kernel/Makefile | 2 + kernel/sched/Makefile | 1 + kernel/sched/core.c | 5 + kernel/sched/ucc_sched.c | 148 +++++++ kernel/sysctl.c | 17 +- kernel/ucc/Kconfig | 21 + kernel/ucc/Makefile | 1 + kernel/ucc/ascend_vstream.c | 654 ++++++++++++++++++++++++++++ kernel/ucc/ascend_vstream.h | 13 + kernel/ucc/vstream.c | 62 +++ kernel/ucc_sched/Makefile | 1 + kernel/ucc_sched/core.c | 591 +++++++++++++++++++++++++ kernel/ucc_sched/ucc_sched.h | 43 ++ 31 files changed, 2666 insertions(+), 2 deletions(-) create mode 100644 drivers/xpu/Kconfig create mode 100644 drivers/xpu/Makefile create mode 100644 drivers/xpu/xpu_group.c create mode 100644 include/linux/ucc_common.h create mode 100644 include/linux/ucc_kfd.h create mode 100644 include/linux/ucc_sched.h create mode 100644 include/linux/ucc_sched/ucc_sched.h create mode 100644 include/linux/ucc_ts.h create mode 100644 include/linux/vstream.h create mode 100644 include/linux/xpu_group.h create mode 100644 include/trace/events/ucc_sched.h create mode 100644 kernel/sched/ucc_sched.c create mode 100644 kernel/ucc/Kconfig create mode 100644 kernel/ucc/Makefile create mode 100644 kernel/ucc/ascend_vstream.c create mode 100644 kernel/ucc/ascend_vstream.h create mode 100644 kernel/ucc/vstream.c create mode 100644 kernel/ucc_sched/Makefile create mode 100644 kernel/ucc_sched/core.c create mode 100644 kernel/ucc_sched/ucc_sched.h diff --git a/Kconfig b/Kconfig index 48a80beab685..8e558777fb54 100644 --- a/Kconfig +++ b/Kconfig @@ -30,3 +30,5 @@ source "crypto/Kconfig" source "lib/Kconfig" source "lib/Kconfig.debug" + +source "kernel/ucc/Kconfig" diff --git a/drivers/Kconfig b/drivers/Kconfig index ab4d43923c4d..bd59e9e525ba 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -219,4 +219,6 @@ source "drivers/siox/Kconfig" source "drivers/slimbus/Kconfig" +source "drivers/xpu/Kconfig" + endmenu diff --git a/drivers/Makefile b/drivers/Makefile index 578f469f72fb..1130b2d92df1 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -186,3 +186,4 @@ obj-$(CONFIG_MULTIPLEXER) += mux/ obj-$(CONFIG_UNISYS_VISORBUS) += visorbus/ obj-$(CONFIG_SIOX) += siox/ obj-$(CONFIG_GNSS) += gnss/ +obj-$(CONFIG_XPU_SCHEDULE) += xpu/ diff --git a/drivers/xpu/Kconfig b/drivers/xpu/Kconfig new file mode 100644 index 000000000000..c4a391d0039d --- /dev/null +++ b/drivers/xpu/Kconfig @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0 + +menuconfig XPU_SCHEDULE + bool "xpu schedule" + default n + help + Support xpu schedule, Say Y here if you want support for use + xpu schedule. + diff --git a/drivers/xpu/Makefile b/drivers/xpu/Makefile new file mode 100644 index 000000000000..9edc6dcdd4d0 --- /dev/null +++ b/drivers/xpu/Makefile @@ -0,0 +1 @@ +obj-y += xpu_group.o diff --git a/drivers/xpu/xpu_group.c b/drivers/xpu/xpu_group.c new file mode 100644 index 000000000000..53a598db0615 --- /dev/null +++ b/drivers/xpu/xpu_group.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/xpu_group.h> +#include <linux/rwsem.h> +#include <linux/slab.h> + +extern int ucc_rt_nr_running(struct xcu *cu); +static DECLARE_RWSEM(xpu_group_rwsem); + +static struct xpu_capability xpu_capability_root; + +struct xpu_group __xpu_root = { + .type = XPU_TYPE_ROOT, + .capability = &xpu_capability_root, + + .next_layer = IDR_INIT(next_layer), +}; + +struct xpu_group *xpu_root = &__xpu_root; +EXPORT_SYMBOL(xpu_root); + +int __xpu_group_attach(struct xpu_group *new_group, + struct xpu_group *previous_group) +{ + int id = new_group->id; + + if (id == -1) + id = idr_alloc(&previous_group->next_layer, new_group, + 0, INT_MAX, GFP_KERNEL); + else + id = idr_alloc(&previous_group->next_layer, new_group, + id, id + 1, GFP_KERNEL); + if (id < 0) + return -EEXIST; + + new_group->id = id; + new_group->previous_layer = previous_group; + + return 0; +} + +int xpu_group_attach(struct xpu_group *new_group, + struct xpu_group *previous_group) +{ + int ret; + + down_write(&xpu_group_rwsem); + ret = __xpu_group_attach(new_group, previous_group); + up_write(&xpu_group_rwsem); + return ret; +} +EXPORT_SYMBOL(xpu_group_attach); + +struct xpu_group *xpu_group_alloc_and_attach(struct xpu_group *previous_group, + int id) +{ + struct xpu_group *new = xpu_group_alloc(); + + if (!new) { + pr_err("alloc xpu_group failed\n"); + return NULL; + } + + new->id = id; + + if (!xpu_group_attach(new, previous_group)) + return NULL; + + return new; +} +EXPORT_SYMBOL(xpu_group_alloc_and_attach); + +int __xpu_group_detach(struct xpu_group *group) +{ + idr_remove(&group->previous_layer->next_layer, group->id); + return 0; +} + +int xpu_group_detach(struct xpu_group *group) +{ + int ret; + + down_write(&xpu_group_rwsem); + ret = __xpu_group_detach(group); + up_write(&xpu_group_rwsem); + return ret; +} +EXPORT_SYMBOL(xpu_group_detach); + +struct xpu_group *__xpu_group_find(struct xpu_group *group, int id) +{ + return idr_find(&group->next_layer, id); +} + +struct xpu_group *xpu_group_find(struct xpu_group *group, int id) +{ + struct xpu_group *p; + + p = xpu_group_alloc(); + + down_read(&xpu_group_rwsem); + p = __xpu_group_find(group, id); + up_read(&xpu_group_rwsem); + + return p; +} +EXPORT_SYMBOL(xpu_group_find); + + +struct xpu_group *xpu_idle_group_find(struct xpu_group *group) +{ + struct xpu_group *entry_group; + int id; + + down_read(&xpu_group_rwsem); + idr_for_each_entry(&group->next_layer, entry_group, id) { + if (!entry_group->used) { + up_read(&xpu_group_rwsem); + return entry_group; + } + } + up_read(&xpu_group_rwsem); + + return NULL; +} + +int xpu_run(struct xpu_group *group, void *para1, void *para2) +{ + int ret = 0; + + if (group->opt && group->opt->run) + ret = group->opt->run(group, para1, para2); + + return ret; +} + +int xpu_finish(struct xpu_group *group, void *para1, void *para2) +{ + if (group->opt && group->opt->finish) + return group->opt->finish(group, para1, para2); + + return 0; +} + +int xpu_wait(struct xpu_group *group, void *para1, void *para2, void *para3) +{ + if (group->opt && group->opt->wait) + return group->opt->wait(group, para1, para2, para3); + + return 0; +} + +int xpu_complete(struct xpu_group *group, void *para1, void *para2, void *para3) +{ + if (group->opt && group->opt->complete) + return group->opt->complete(group, para1, para2, para3); + + return 0; +} + +struct xpu_group *xpu_group_alloc(void) +{ + struct xpu_group *node = kzalloc(sizeof(*node), GFP_KERNEL); + + if (!node) + return NULL; + + node->type = XPU_TYPE_CUSTOM; + idr_init(&node->next_layer); + + return node; +} +EXPORT_SYMBOL(xpu_group_alloc); diff --git a/fs/proc/base.c b/fs/proc/base.c index dc9841826264..516eee1ae952 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -770,7 +770,6 @@ static const struct file_operations proc_single_file_operations = { .release = single_release, }; - struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode) { struct task_struct *task = get_proc_task(inode); @@ -1546,6 +1545,99 @@ static const struct file_operations proc_pid_sched_operations = { #endif +#ifdef CONFIG_XPU_SCHEDULE +static ssize_t ucc_step_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + char numbuf[PROC_NUMBUF]; + ssize_t len; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + + len = snprintf(numbuf, sizeof(numbuf), "%u\n", task->ucc_step); + + put_task_struct(task); + + return simple_read_from_buffer(buf, count, ppos, numbuf, len); +} + +static ssize_t ucc_step_write(struct file *file, const char __user *buf, + size_t count, loff_t *offset) +{ + struct inode *inode = file_inode(file); + struct task_struct *p; + int err; + unsigned int ucc_step; + + p = get_proc_task(inode); + if (!p) + return -ESRCH; + + err = kstrtouint_from_user(buf, count, 0, &ucc_step); + if (err) + return err; + + p->ucc_step = ucc_step; + put_task_struct(p); + + return count; +} + +static const struct file_operations ucc_step_operations = { + .write = ucc_step_write, + .read = ucc_step_read, +}; + +static ssize_t ucc_priority_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + char numbuf[PROC_NUMBUF]; + ssize_t len; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + + len = snprintf(numbuf, sizeof(numbuf), "%u\n", task->ucc_priority); + + put_task_struct(task); + + return simple_read_from_buffer(buf, count, ppos, numbuf, len); +} + +static ssize_t ucc_priority_write(struct file *file, const char __user *buf, + size_t count, loff_t *offset) +{ + struct inode *inode = file_inode(file); + struct task_struct *p; + int err; + unsigned int ucc_priority; + + p = get_proc_task(inode); + if (!p) + return -ESRCH; + + err = kstrtouint_from_user(buf, count, 0, &ucc_priority); + if (err) + return err; + + p->ucc_priority = ucc_priority; + put_task_struct(p); + + return count; +} + +static const struct file_operations ucc_priority_operations = { + .write = ucc_priority_write, + .read = ucc_priority_read, +}; + +#endif + #ifdef CONFIG_SCHED_AUTOGROUP /* * Print out autogroup related information: @@ -3151,6 +3243,10 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_ASCEND_SHARE_POOL ONE("sp_group", S_IRUGO, proc_sp_group_state), #endif +#ifdef CONFIG_XPU_SCHEDULE + REG("ucc_priority", 0644, ucc_priority_operations), + REG("ucc_step", 0644, ucc_step_operations), +#endif }; static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx) @@ -3537,6 +3633,10 @@ static const struct pid_entry tid_base_stuff[] = { #ifdef CONFIG_ASCEND_SHARE_POOL ONE("sp_group", S_IRUGO, proc_sp_group_state), #endif +#ifdef CONFIG_XPU_SCHEDULE + REG("ucc_priority", 0644, ucc_priority_operations), + REG("ucc_step", 0644, ucc_step_operations), +#endif }; static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8fd8c5b7cdc6..175659be95f3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1281,6 +1281,9 @@ struct task_struct { #if !defined(__GENKSYMS__) #if defined(CONFIG_QOS_SCHED_SMART_GRID) struct sched_grid_qos *grid_qos; +#elif defined(CONFIG_XPU_SCHEDULE) + u32 ucc_priority; + u32 ucc_step; #else KABI_RESERVE(8) #endif diff --git a/include/linux/ucc_common.h b/include/linux/ucc_common.h new file mode 100644 index 000000000000..3875c2226d24 --- /dev/null +++ b/include/linux/ucc_common.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _UCC_COMMON_H +#define _UCC_COMMON_H + +/* + * UCC Print Function + */ +#ifndef pr_fmt +#define pr_fmt(fmt) fmt +#endif + +#define ucc_err(fmt, ...) printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) + +#define ucc_warn(fmt, ...) printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__) + +#define ucc_info(fmt, ...) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) + +#define ucc_dbg(fmt, ...) printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) + +#endif diff --git a/include/linux/ucc_kfd.h b/include/linux/ucc_kfd.h new file mode 100644 index 000000000000..07eedc2fd5f2 --- /dev/null +++ b/include/linux/ucc_kfd.h @@ -0,0 +1,110 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef KFD_PRIV_H_INCLUDED +#define KFD_PRIV_H_INCLUDED + +#include <linux/mmu_notifier.h> +#include <linux/types.h> +#include <linux/kref.h> +#include <linux/mutex.h> +#include <linux/sched.h> +#include <linux/mmu_notifier.h> +#include <linux/idr.h> +#include <linux/dma-fence.h> +#include <linux/workqueue.h> +#include <linux/fs.h> +#include <linux/kobject.h> +#include <linux/sysfs.h> + +struct process_queue_manager; +struct kfd_process; +struct kfd_signal_page; + +struct process_queue_manager { + struct kfd_process *process; + struct list_head queues; + unsigned long *queue_slot_bitmap; +}; + +struct kfd_signal_page { + uint64_t *kernel_address; + uint64_t __user *user_address; + bool need_to_free_pages; +}; + +/* Process data */ +struct kfd_process { + struct hlist_node kfd_processes; + void *mm; + struct kref ref; + struct work_struct release_work; + struct mutex mutex; + struct task_struct *lead_thread; + struct mmu_notifier mmu_notifier; +/* TODO: check if use right branch */ + struct rcu_head rcu; + uint16_t pasid; + struct list_head per_device_data; + struct process_queue_manager pqm; + bool is_32bit_user_mode; + struct mutex event_mutex; + struct idr event_idr; + struct kfd_signal_page *signal_page; + size_t signal_mapped_size; + size_t signal_event_count; + bool signal_event_limit_reached; +/* TODO: check if use right branch */ + struct rb_root bo_interval_tree; + void *kgd_process_info; + struct dma_fence *ef; + struct delayed_work eviction_work; + struct delayed_work restore_work; + unsigned int last_eviction_seqno; + unsigned long last_restore_timestamp; + unsigned long last_evict_timestamp; + bool debug_trap_enabled; + uint32_t trap_debug_wave_launch_mode; + struct file *dbg_ev_file; + uint32_t allocated_debug_watch_point_bitmask; + struct kobject *kobj; + struct kobject *kobj_queues; + struct attribute attr_pasid; + bool has_cwsr; + uint64_t exception_enable_mask; + uint64_t exception_status; +}; + +struct kfd_ioctl_create_queue_args { + __u64 ring_base_address; /* to KFD */ + __u64 write_pointer_address; /* from KFD */ + __u64 read_pointer_address; /* from KFD */ + __u64 doorbell_offset; /* from KFD */ + + __u32 ring_size; /* to KFD */ + __u32 gpu_id; /* to KFD */ + __u32 queue_type; /* to KFD */ + __u32 queue_percentage; /* to KFD */ + __u32 queue_priority; /* to KFD */ + __u32 queue_id; /* from KFD */ + + __u64 eop_buffer_address; /* to KFD */ + __u64 eop_buffer_size; /* to KFD */ + __u64 ctx_save_restore_address; /* to KFD */ + __u32 ctx_save_restore_size; /* to KFD */ + __u32 ctl_stack_size; /* to KFD */ +}; + +struct kfd_ioctl_destroy_queue_args { + __u32 queue_id; /* to KFD */ + __u32 pad; +}; + +struct kfd_ioctl_update_queue_args { + __u64 ring_base_address; /* to KFD */ + + __u32 queue_id; /* to KFD */ + __u32 ring_size; /* to KFD */ + __u32 queue_percentage; /* to KFD */ + __u32 queue_priority; /* to KFD */ +}; +#endif diff --git a/include/linux/ucc_sched.h b/include/linux/ucc_sched.h new file mode 100644 index 000000000000..5b170545f7c2 --- /dev/null +++ b/include/linux/ucc_sched.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_UCC_SCHED_H__ +#define __LINUX_UCC_SCHED_H__ + +#include <linux/list.h> +#include <linux/types.h> +#include <linux/kernel.h> +#include <linux/hash.h> +#include <linux/rculist.h> +#include <linux/idr.h> +#include <linux/xpu_group.h> +#include <linux/hashtable.h> +#include <linux/vstream.h> +#include <linux/slab.h> +#include <linux/sched.h> + +#define VRTSQ_RTSQ_HASH_ORDER 6 + +#ifdef CONFIG_XPU_SCHEDULE +int ucc_process_task(struct vstream_info *vsqcq_info, struct tsdrv_ctx *ctx, + int *sqenum); +int ucc_free_task(struct vstream_info *vsqcq_info, struct tsdrv_ctx *ctx); +int ucc_wait_cq(struct vstream_info *vsqcq_info, struct tsdrv_ctx *ctx, + struct devdrv_report_para *arg, int *sqenum); +struct xpu_group *select_sq(struct vstream_info *vstream_info); +int ucc_sched_register_xcu(int dev_id, int ts_id, int cu_num); +void ucc_set_vstream_state(struct vstream_info *vinfo, int state); +void ucc_dequeue_task(struct vstream_info *vInfo); +int ucc_rt_nr_running(struct xcu *cu); +struct xcu *ucc_get_xcu_by_id(int cu_id); +int ucc_xcu_is_sched(int cu_id); +void ucc_dump_statistics_info(struct ucc_se *se); +#endif + +#endif diff --git a/include/linux/ucc_sched/ucc_sched.h b/include/linux/ucc_sched/ucc_sched.h new file mode 100644 index 000000000000..6edd8930e09e --- /dev/null +++ b/include/linux/ucc_sched/ucc_sched.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) Huawei Technologies Co., Ltd. 2019. All rights reserved. + * Author: Huawei OS Kernel Lab + * Create: Mon Jan 30 14:29:19 2023 + */ + +#ifndef __LINUX_UCC_SCHED_USCHED_H__ +#define __LINUX_UCC_SCHED_USCHED_H__ + +enum ucc_se_state { + SE_PREPARE, + SE_READY, + SE_RUNNING, + SE_BLOCK, + SE_DEAD, +}; + +enum ucc_se_flag { + UCC_TIF_NONE, + UCC_TIF_PREEMPT, + UCC_TIF_BALANCE, +}; + +enum ucc_se_prio { + UCC_PRIO_HIGH, + UCC_PRIO_LOW, +}; + +enum ucc_se_step { + UCC_STEP_SLOW = 1, + UCC_STEP_FAST = 10, +}; + +struct ucc_statistics { + u64 wait_start; + u64 wait_max; + u64 wait_count; + u64 wait_sum; + + u64 preempt_start; + u64 preempt_max; + u64 preempt_count; + u64 preempt_sum; + + u64 kernel_sum; + u64 timeout_count; + + u64 run_start; + u64 run_max; + u64 run_count; + u64 run_sum; +}; + +struct ucc_se { + int on_cu; + struct list_head run_list; + enum ucc_se_state state; + enum ucc_se_flag flag; + enum ucc_se_prio prio; + enum ucc_se_step step; + raw_spinlock_t se_lock; + struct ucc_statistics statistics; + int is_timeout; +}; + +int ucc_sched_init(void); +int ucc_schedule(int cu_id); +int ucc_wake_up(struct ucc_se *se); + +#endif diff --git a/include/linux/ucc_ts.h b/include/linux/ucc_ts.h new file mode 100644 index 000000000000..7280ccca1059 --- /dev/null +++ b/include/linux/ucc_ts.h @@ -0,0 +1,254 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef TS_H +#define TS_H + +#include <linux/file.h> +#include <linux/device.h> +#include <linux/cdev.h> +#include <linux/fs.h> + +#define DEVDRV_MAX_SQ_DEPTH (1024) +#define DEVDRV_SQ_SLOT_SIZE (64) + +#define DEVDRV_MAX_SQ_NUM (512 - 1) +#define DEVDRV_MAX_CQ_NUM (352 - 1) + +#define DEVDRV_MAX_TS_NUM (1) + +#define REMAP_ALIGN_SIZE (64 * 1024) +#define REMAP_ALIGN_MASK (~(REMAP_ALIGN_SIZE - 1)) +#define REMAP_ALIGN(x) (((x) + REMAP_ALIGN_SIZE - 1) & \ + REMAP_ALIGN_MASK) + +#define DEVDRV_DB_SPACE_SIZE (1024 * 4096) + +#define SQCQ_RTS_INFO_LENGTH 5 +#define SQCQ_RESV_LENGTH 8 + +#define DEVDRV_CBCQ_MAX_GID 128 + +enum phy_sqcq_type { + NORMAL_SQCQ_TYPE = 0, + CALLBACK_SQCQ_TYPE, + LOGIC_SQCQ_TYPE, + SHM_SQCQ_TYPE, + DFX_SQCQ_TYPE, + TS_SQCQ_TYPE, + KERNEL_SQCQ_TYPE, +}; + +struct notifier_operations { + int (*notifier_call)(struct file *file_op, unsigned long mode); +}; + +#define MAX_DEVICE_COUNT 64 + +struct davinci_intf_stru { + atomic_t count; + struct mutex dmutex; + struct cdev cdev; + struct device *device; + struct list_head process_list; + struct list_head module_list; + unsigned int device_status[MAX_DEVICE_COUNT]; + cpumask_var_t cpumask; +}; + +#define DAVINIC_MODULE_NAME_MAX 256 +struct davinci_intf_private_stru { + char module_name[DAVINIC_MODULE_NAME_MAX]; + unsigned int device_id; + pid_t owner_pid; + int close_flag; + atomic_t work_count; + int release_status; + struct mutex fmutex; + const struct file_operations fops; + struct notifier_operations notifier; + struct davinci_intf_stru *device_cb; + struct file priv_filep; + unsigned int free_type; +}; + +enum sqcq_alloc_status { + SQCQ_INACTIVE = 0, + SQCQ_ACTIVE +}; + +struct devdrv_ts_sq_info { + enum phy_sqcq_type type; + pid_t tgid; + u32 head; + u32 tail; + u32 credit; + u32 index; + int uio_fd; + + u8 *uio_addr; + int uio_size; + + enum sqcq_alloc_status alloc_status; + u64 send_count; + + void *sq_sub; +}; + +struct devdrv_ts_cq_info { + enum phy_sqcq_type type; + pid_t tgid; + u32 vfid; + + u32 head; + u32 tail; + u32 release_head; /* runtime read cq head value */ + u32 index; + u32 phase; + u32 int_flag; + + int uio_fd; + + u8 *uio_addr; + int uio_size; + + enum sqcq_alloc_status alloc_status; + u64 receive_count; + + void *cq_sub; + + void (*complete_handle)(struct devdrv_ts_cq_info *cq_info); + + u8 slot_size; +}; + +#define DEVDRV_SQ_INFO_OCCUPY_SIZE \ + (sizeof(struct devdrv_ts_sq_info) * DEVDRV_MAX_SQ_NUM) +#define DEVDRV_CQ_INFO_OCCUPY_SIZE \ + (sizeof(struct devdrv_ts_cq_info) * DEVDRV_MAX_CQ_NUM) + +#define DEVDRV_MAX_INFO_SIZE \ + (DEVDRV_SQ_INFO_OCCUPY_SIZE + DEVDRV_CQ_INFO_OCCUPY_SIZE) +#define DEVDRV_VM_SQ_MEM_OFFSET 0 +#define DEVDRV_VM_SQ_SLOT_SIZE \ + REMAP_ALIGN(DEVDRV_MAX_SQ_DEPTH * DEVDRV_SQ_SLOT_SIZE) +#define DEVDRV_VM_SQ_MEM_SIZE \ + (DEVDRV_VM_SQ_SLOT_SIZE * DEVDRV_MAX_SQ_NUM) + +#define DEVDRV_VM_INFO_MEM_OFFSET \ + (DEVDRV_VM_SQ_MEM_OFFSET + DEVDRV_VM_SQ_MEM_SIZE) +#define DEVDRV_VM_INFO_MEM_SIZE REMAP_ALIGN(DEVDRV_MAX_INFO_SIZE) + +#define DEVDRV_VM_DB_MEM_OFFSET \ + (DEVDRV_VM_INFO_MEM_OFFSET + DEVDRV_VM_INFO_MEM_SIZE) +#define DEVDRV_VM_DB_MEM_SIZE REMAP_ALIGN(DEVDRV_DB_SPACE_SIZE) + +#define DEVDRV_VM_CQ_MEM_OFFSET \ + (DEVDRV_VM_DB_MEM_OFFSET + DEVDRV_VM_DB_MEM_SIZE) + +enum tsdrv_id_type { + TSDRV_STREAM_ID, + TSDRV_NOTIFY_ID, + TSDRV_MODEL_ID, + TSDRV_EVENT_SW_ID, /* should use for event alloc/free/inquiry res_num*/ + TSDRV_EVENT_HW_ID, + TSDRV_IPC_EVENT_ID, + TSDRV_SQ_ID, + TSDRV_CQ_ID, + TSDRV_PCQ_ID, + TSDRV_MAX_ID, +}; + +#define TSDRV_CQ_REUSE 0x00000001 +#define TSDRV_SQ_REUSE 0x00000002 + +struct normal_alloc_sqcq_para { + uint32_t fd; + uint32_t tsId; + uint32_t devId; + uint32_t sqeSize; + uint32_t cqeSize; + uint32_t sqeDepth; + uint32_t cqeDepth; + uint32_t grpId; + uint32_t flag; + uint32_t sqId; + uint32_t cqId; + uint32_t priority; + uint32_t info[SQCQ_RTS_INFO_LENGTH]; + uint32_t res[SQCQ_RESV_LENGTH]; +}; + +struct normal_free_sqcq_para { + uint32_t tsId; + uint32_t flag; + uint32_t sqId; + uint32_t cqId; + uint32_t res[SQCQ_RESV_LENGTH]; +}; + +struct tsdrv_sqcq_data_para { + uint32_t id; + uint32_t val; +}; + +struct devdrv_report_para { + int timeout; + u32 cq_tail; + u32 cq_id; +}; + +struct tsdrv_ts_id_ctx { + u32 id_num; + struct list_head id_list; + spinlock_t id_lock; +}; +struct tsdrv_ts_ctx { + u32 tsid; + atomic_t status; + u32 send_count; + u64 receive_count; + + int32_t cq_tail_updated; + wait_queue_head_t report_wait; + + struct work_struct recycle_work; + + wait_queue_head_t cbcq_wait[DEVDRV_CBCQ_MAX_GID]; + + void *shm_sqcq_ctx; + void *logic_sqcq_ctx; + void *sync_cb_sqcq_ctx; // mini callback + + struct tsdrv_ts_id_ctx id_ctx[TSDRV_MAX_ID]; + + /* only used by vm */ + u32 vcqid; + u32 wait_queue_inited; + u32 cq_report_status; + int32_t cq_tail; + spinlock_t ctx_lock; + + u32 recycle_cbsqcq_num; // min callback +}; + +//Context Delivers +struct tsdrv_ctx { + u32 ctx_index; + atomic_t status; + atomic_t type; + pid_t tgid; + pid_t pid; + int32_t ssid; + u32 thread_bind_irq_num; + u32 mirror_ctx_status; + struct rb_node node; + struct list_head list; + struct vm_area_struct *vma[DEVDRV_MAX_TS_NUM]; + spinlock_t ctx_lock; + struct mutex mutex_lock; + struct tsdrv_ts_ctx ts_ctx[DEVDRV_MAX_TS_NUM]; + + u64 unique_id; /* mark unique processes for vm */ +}; + +#endif diff --git a/include/linux/vstream.h b/include/linux/vstream.h new file mode 100644 index 000000000000..14d799296053 --- /dev/null +++ b/include/linux/vstream.h @@ -0,0 +1,123 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_VSTREAM_H +#define _LINUX_VSTREAM_H + +#include <linux/ucc_kfd.h> +#include <linux/ucc_sched/ucc_sched.h> +#include <linux/ucc_ts.h> + +#define MAX_VSTREAM_SIZE 1024 +#define MAX_VSTREAM_SLOT_SIZE 64 +#define MAX_CQ_SLOT_SIZE 12 + +/* + * XXX_VSTREAM_ALLOC: alloc a vstream, buffer for tasks + * XXX_VSTREAM_FREE: free a vstream + * XXX_VSTREAM_KICK: there are tasks to be executed in the vstream + * XXX_VSTREAM_UPDATE: update information for an existing vstream + * XXX_CALLBACK_VSTREAM_WAIT: waiting for callback tasks + * XXX_CALLBACK_VSTREAM_KICK: callback tasks have been executed + * + * NOTE: Callback vstream is only for Ascend now. We do not need + * CALLBACK_VSTREAM_ALLOC because the callback vstream will be + * alloced with vstream on Ascend. + */ +enum VSTREAM_COMMAND { + /* vstream command for Ascend */ + ASCEND_VSTREAM_ALLOC = 0, + ASCEND_VSTREAM_FREE, + ASCEND_VSTREAM_KICK, + ASCEND_CALLBACK_VSTREAM_WAIT, + ASCEND_CALLBACK_VSTREAM_KICK, + ASCEND_VSTREAM_GET_HEAD, + ASCEND_MAX_COMMAND, + + /* vstream command for amdgpu */ + AMDGPU_VSTREAM_ALLOC = ASCEND_MAX_COMMAND + 1, + AMDGPU_VSTREAM_FREE, + AMDGPU_VSTREAM_KICK, + AMDGPU_VSTREAM_UPDATE, + AMDGPU_MAX_COMMAND, +}; + +struct vstream_alloc_args { + union { + /* For Ascend */ + struct normal_alloc_sqcq_para ascend; + /* For amdgpu */ + struct kfd_ioctl_create_queue_args amdgpu; + }; +}; + +struct vstream_free_args { + union { + /* For Ascend */ + struct normal_free_sqcq_para ascend; + /* For amdgpu */ + struct kfd_ioctl_destroy_queue_args amdgpu; + }; +}; + +struct vstream_kick_args { + union { + /* For Ascend */ + struct tsdrv_sqcq_data_para ascend; + /* For amdgpu */ + }; +}; + +struct vstream_args { + union { + struct vstream_alloc_args va_args; + struct vstream_free_args vf_args; + struct vstream_kick_args vk_args; + struct kfd_ioctl_update_queue_args vu_args; + struct tsdrv_sqcq_data_para vh_args; + struct devdrv_report_para cvw_args; + struct tsdrv_sqcq_data_para cvk_args; + }; +}; + +struct vstream_node { + uint32_t id; + uint32_t head; + uint32_t tail; + uint32_t credit; + void *vstreamData; + raw_spinlock_t spin_lock; +}; + +struct vstream_id { + uint32_t vstreamId; + struct list_head list; +}; + +struct vcq_map_table { + uint32_t vcqId; + struct vstream_node *vcqNode; + struct list_head vstreamId_list; +}; + +struct vstream_info { + uint32_t vstreamId; //key + uint32_t vcqId; + uint32_t devId; + uint32_t tsId; + struct ucc_se se; + //TODO::check name + struct vstream_node *vsqNode; + struct vstream_node *vcqNode; + void *privdata; + uint32_t info[SQCQ_RTS_INFO_LENGTH]; + int cu_id; + struct xpu_group *group; + int send_cnt; + struct task_struct *p; +}; + +typedef int vstream_manage_t(struct vstream_args *arg); +int update_vstream_head(struct vstream_info *vstream_info, int num); +struct vstream_info *vstream_get_info(uint32_t id); +bool vstream_have_kernel(struct ucc_se *se); + +#endif /* _LINUX_VSTREAM_H */ diff --git a/include/linux/xpu_group.h b/include/linux/xpu_group.h new file mode 100644 index 000000000000..5e3a96b15f9c --- /dev/null +++ b/include/linux/xpu_group.h @@ -0,0 +1,66 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __XPU_GROUP_H__ +#define __XPU_GROUP_H__ +#include <linux/idr.h> + +struct xpu_group; +struct xcu; + +enum xpu_type { + XPU_TYPE_ROOT, + XPU_TYPE_TASK_QUEUE, + XPU_TYPE_NPU_310, + XPU_TYPE_CUSTOM, +}; + +enum xpu_capability_type { + TYPE_1, + XPU_CAPABILITY_TYPE_NR, +}; + +struct xpu_capability { + unsigned long capacities[XPU_CAPABILITY_TYPE_NR]; +}; + +struct xpu_operation { + int (*run)(struct xpu_group *group, void *para1, void *para2); + int (*finish)(struct xpu_group *group, void *para1, void *para2); + int (*wait)(struct xpu_group *group, void *para1, void *para2, + void *para3); + int (*complete)(struct xpu_group *group, void *para1, void *para2, + void *para3); +}; + +struct xpu_group { + int id; + enum xpu_type type; + struct xpu_capability *capability; + + struct xpu_group *previous_layer; + struct idr next_layer; + + struct xpu_operation *opt; + + int used; + + void *data; +}; + +extern struct xpu_group *xpu_root; + +#ifdef CONFIG_XPU_SCHEDULE +int xpu_group_attach(struct xpu_group *new_group, + struct xpu_group *previous_group); +int xpu_group_detach(struct xpu_group *group); +struct xpu_group *xpu_group_find(struct xpu_group *group, int id); +struct xpu_group *xpu_idle_group_find(struct xpu_group *group); +struct xpu_group *xpu_group_alloc(void); +struct xpu_group *xpu_group_alloc_and_attach(struct xpu_group *previous_group, + int id); +int xpu_run(struct xpu_group *group, void *para1, void *para2); +int xpu_finish(struct xpu_group *group, void *para1, void *para2); +int xpu_wait(struct xpu_group *group, void *para1, void *para2, void *para3); +#endif + +#endif diff --git a/include/trace/events/ucc_sched.h b/include/trace/events/ucc_sched.h new file mode 100644 index 000000000000..104a39b2f41c --- /dev/null +++ b/include/trace/events/ucc_sched.h @@ -0,0 +1,120 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM ucc_sched + +#if !defined(_TRACE_UCC_SCHED_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_UCC_SCHED_H + +#include <linux/tracepoint.h> +#include <linux/binfmts.h> + +/* + * XXX the below ucc_sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE + * adding ucc_sched_stat support to SCHED_FIFO/RR would be welcome. + */ +DECLARE_EVENT_CLASS(ucc_sched_stat_template, + + TP_PROTO(struct vstream_info *vinfo, u64 delay), + + TP_ARGS(vinfo, delay), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(int, cu_id) + __field(u32, vstreamId) + __field(u32, prio) + __field(u64, delay) + ), + + TP_fast_assign( + memcpy(__entry->comm, vinfo->p->comm, TASK_COMM_LEN); + __entry->pid = vinfo->p->pid; + __entry->cu_id = vinfo->cu_id; + __entry->vstreamId = vinfo->vstreamId; + __entry->prio = vinfo->p->ucc_priority; + __entry->delay = delay; + ), + + TP_printk("comm=%s pid=%d cu_id=%d vstreamId %u prio %u, delay=%llu [ns]", + __entry->comm, __entry->pid, + __entry->cu_id, __entry->vstreamId, __entry->prio, + (unsigned long long)__entry->delay) +); + +DECLARE_EVENT_CLASS(ucc_sched_stat_template_1, + + TP_PROTO(struct vstream_info *vinfo, u64 delay, int is_timeout), + + TP_ARGS(vinfo, delay, is_timeout), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(int, cu_id) + __field(u32, vstreamId) + __field(u64, delay) + __field(int, is_timeout) + ), + + TP_fast_assign( + memcpy(__entry->comm, vinfo->p->comm, TASK_COMM_LEN); + __entry->pid = vinfo->p->pid; + __entry->cu_id = vinfo->cu_id; + __entry->vstreamId = vinfo->vstreamId; + __entry->delay = delay; + __entry->is_timeout = is_timeout; + ), + + TP_printk("comm=%s pid=%d cu_id=%d vstreamId %u, delay=%llu [ns]:%d", + __entry->comm, __entry->pid, + __entry->cu_id, __entry->vstreamId, + (unsigned long long)__entry->delay, + __entry->is_timeout) +); +/* + * Tracepoint for accounting wait time (time the task is runnable + * but not actually running due to scheduler contention). + */ +DEFINE_EVENT(ucc_sched_stat_template, ucc_sched_stat_wait, + TP_PROTO(struct vstream_info *vinfo, u64 delay), + TP_ARGS(vinfo, delay)); + +DEFINE_EVENT(ucc_sched_stat_template, ucc_sched_stat_preempt, + TP_PROTO(struct vstream_info *vinfo, u64 delay), + TP_ARGS(vinfo, delay)); + +DEFINE_EVENT(ucc_sched_stat_template_1, ucc_sched_stat_run, + TP_PROTO(struct vstream_info *vinfo, u64 delay, int is_timeout), + TP_ARGS(vinfo, delay, is_timeout)); + +TRACE_EVENT(ucc_sched_switch, + + TP_PROTO(int preempt, + struct vstream_info *next), + + TP_ARGS(preempt, next), + + TP_STRUCT__entry( + __field(int, cu_id) + __field(u32, next_vstreamId) + __field(u32, next_prio) + __field(int, preempt) + ), + + TP_fast_assign( + __entry->cu_id = next->cu_id; + __entry->next_vstreamId = next->vstreamId; + __entry->next_prio = next->p->ucc_priority; + __entry->preempt = preempt; + ), + + TP_printk("cu_id=%d next_vstreamId %u next_prio %u preempt[%d]", + __entry->cu_id, + __entry->next_vstreamId, __entry->next_prio, + __entry->preempt) +); +#endif /* _TRACE_UCC_SCHED_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/init/init_task.c b/init/init_task.c index b312a045f4b9..c1a78b4da368 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -188,6 +188,10 @@ struct task_struct init_task .fork_pid = 0, }, #endif +#ifdef CONFIG_XPU_SCHEDULE + .ucc_priority = 1, + .ucc_step = 1, +#endif }; EXPORT_SYMBOL(init_task); diff --git a/init/main.c b/init/main.c index 50af60ff0ef6..7ed2e67d7011 100644 --- a/init/main.c +++ b/init/main.c @@ -66,6 +66,7 @@ #include <linux/kthread.h> #include <linux/sched.h> #include <linux/sched/init.h> +#include <linux/ucc_sched/ucc_sched.h> #include <linux/signal.h> #include <linux/idr.h> #include <linux/kgdb.h> @@ -599,6 +600,14 @@ asmlinkage __visible void __init start_kernel(void) * time - but meanwhile we still have a functioning scheduler. */ sched_init(); + +#ifdef CONFIG_XPU_SCHEDULE + /* + * Set up the ucc scheduler, to enable heterogeneous scheduling. + */ + ucc_sched_init(); +#endif + /* * Disable preemption - early bootup scheduling is extremely * fragile until we cpu_idle() for the first time. diff --git a/kernel/Makefile b/kernel/Makefile index d0482bd27ba4..273fe481d303 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -43,6 +43,8 @@ obj-y += irq/ obj-y += rcu/ obj-y += livepatch/ obj-y += dma/ +obj-(CONFIG_XPU_SCHEDULE) += ucc_sched/ +obj-(CONFIG_XPU_UCC) += ucc/ obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 0612af002ae5..0f659b2ad251 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -19,6 +19,7 @@ endif obj-y += core.o loadavg.o clock.o cputime.o obj-y += idle.o fair.o rt.o deadline.o obj-y += wait.o wait_bit.o swait.o completion.o +obj-(CONFIG_XPU_SCHEDULE) += ucc_sched.o obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 67bda877bfa8..89348097b29a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2316,6 +2316,11 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) */ p->prio = current->normal_prio; +#ifdef CONFIG_XPU_SCHEDULE + p->ucc_priority = current->ucc_priority; + p->ucc_step = current->ucc_step; +#endif + /* * Revert to default priority/policy on fork if requested. */ diff --git a/kernel/sched/ucc_sched.c b/kernel/sched/ucc_sched.c new file mode 100644 index 000000000000..646f120c3c34 --- /dev/null +++ b/kernel/sched/ucc_sched.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/ucc_sched.h> +#include <linux/ucc_common.h> + +static DEFINE_MUTEX(revmap_mutex); + +static DEFINE_HASHTABLE(vrtsq_rtsq_revmap, VRTSQ_RTSQ_HASH_ORDER); + +/** + * @group: value for this entry. + * @hash_node : hash node list. + * @ + */ +struct vsqce_idx_revmap_data { + unsigned int vrtsdId; + struct xpu_group *group; + struct hlist_node hash_node; +}; + +struct xpu_group *select_sq(struct vstream_info *vstream_info) +{ + struct vsqce_idx_revmap_data *revmap_data; + + /* find history */ + mutex_lock(&revmap_mutex); + hash_for_each_possible(vrtsq_rtsq_revmap, revmap_data, hash_node, + (unsigned long)vstream_info->vstreamId) { + if (revmap_data && revmap_data->group) { + mutex_unlock(&revmap_mutex); + return revmap_data->group; + } + } + mutex_unlock(&revmap_mutex); + + revmap_data = kzalloc(sizeof(struct vsqce_idx_revmap_data), GFP_KERNEL); + if (revmap_data == NULL) + return NULL; + /* find XPU group */ + revmap_data->group = xpu_group_find(xpu_root, XPU_TYPE_NPU_310); + if (revmap_data->group == NULL) { + ucc_err("find XPU group is failed.\n"); + return NULL; + } + /* find device group */ + revmap_data->group = xpu_group_find(revmap_data->group, + vstream_info->devId); + if (revmap_data->group == NULL) { + ucc_err("find device group is failed.\n"); + return NULL; + } + /* find tsgroup */ + revmap_data->group = xpu_group_find(revmap_data->group, + vstream_info->tsId); + if (revmap_data->group == NULL) { + ucc_err("find ts group is failed.\n"); + return NULL; + } + + /* select idle xcu */ + revmap_data->group = xpu_idle_group_find(revmap_data->group); + if (revmap_data->group == NULL) { + ucc_err("find rtsq group is failed.\n"); + return NULL; + } + + revmap_data->vrtsdId = vstream_info->vstreamId; + /* set group used : 1 */ + revmap_data->group->used = 1; + + mutex_lock(&revmap_mutex); + hash_add(vrtsq_rtsq_revmap, &revmap_data->hash_node, + (unsigned long)vstream_info->vstreamId); + mutex_unlock(&revmap_mutex); + return revmap_data->group; +} + +int ucc_process_task(struct vstream_info *vstream_info, struct tsdrv_ctx *ctx, + int *sqenum) +{ + struct xpu_group *group = NULL; + + if (vstream_info == NULL) { + ucc_err("vsqcq_info is NULL\n"); + return -1; + } + + group = select_sq(vstream_info); + if (group == NULL) { + ucc_err("find group is failed.\n"); + return -1; + } + /* send sqe */ + *sqenum = xpu_run(group, vstream_info, ctx); + + return 0; +} +EXPORT_SYMBOL(ucc_process_task); + +int ucc_free_task(struct vstream_info *vstream_info, struct tsdrv_ctx *ctx) +{ + struct vsqce_idx_revmap_data *revmap_data; + + ucc_dequeue_task(vstream_info); + + while (!ucc_xcu_is_sched(vstream_info->cu_id)) + schedule_timeout_interruptible(10); + + ucc_dump_statistics_info(&vstream_info->se); + + mutex_lock(&revmap_mutex); + hash_for_each_possible(vrtsq_rtsq_revmap, revmap_data, hash_node, + (unsigned long)vstream_info->vstreamId) { + if (revmap_data && + revmap_data->vrtsdId == vstream_info->vstreamId && + revmap_data->group) { + xpu_finish(revmap_data->group, vstream_info, ctx); + /* set group unused : 0 */ + revmap_data->group->used = 0; + hash_del(&revmap_data->hash_node); + kfree(revmap_data); + revmap_data = NULL; + break; + } + } + mutex_unlock(&revmap_mutex); + + return 0; +} +EXPORT_SYMBOL(ucc_free_task); + +int ucc_wait_cq(struct vstream_info *vstream_info, struct tsdrv_ctx *ctx, + struct devdrv_report_para *arg, int *cqenum) +{ + struct vsqce_idx_revmap_data *revmap_data; + + hash_for_each_possible(vrtsq_rtsq_revmap, revmap_data, hash_node, + (unsigned long)vstream_info->vstreamId) { + if (revmap_data && + revmap_data->vrtsdId == vstream_info->vstreamId && + revmap_data->group) + *cqenum = xpu_wait(revmap_data->group, vstream_info, + ctx, arg); + } + + return 0; +} +EXPORT_SYMBOL(ucc_wait_cq); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c7064f67f4a5..aeceb9e9c927 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -117,6 +117,10 @@ extern unsigned int sysctl_nr_open_min, sysctl_nr_open_max; extern int sysctl_nr_trim_pages; #endif +#ifdef CONFIG_XPU_SCHEDULE +extern int sysctl_ucc_sched_rcv_timeout_ms; +#endif + /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR static int sixty = 60; @@ -139,7 +143,7 @@ static int one_thousand = 1000; #ifdef CONFIG_PRINTK static int ten_thousand = 10000; #endif -#if defined(CONFIG_QOS_SCHED) || defined(CONFIG_QOS_SCHED_SMART_GRID) +#if defined(CONFIG_QOS_SCHED) || defined(CONFIG_QOS_SCHED_SMART_GRID) || defined(CONFIG_XPU_SCHEDULE) static int hundred_thousand = 100000; #endif #ifdef CONFIG_PERF_EVENTS @@ -352,6 +356,17 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_XPU_SCHEDULE + { + .procname = "ucc_sched_rcv_timeout", + .data = &sysctl_ucc_sched_rcv_timeout_ms, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &hundred_thousand, + }, +#endif #ifdef CONFIG_SCHED_DEBUG { .procname = "sched_min_granularity_ns", diff --git a/kernel/ucc/Kconfig b/kernel/ucc/Kconfig new file mode 100644 index 000000000000..279c11f702b1 --- /dev/null +++ b/kernel/ucc/Kconfig @@ -0,0 +1,21 @@ +# +# TODO: add description +# + +config XPU_UCC + bool "ucc" + default n + depends on ARM64 || X86 + help + Say Y here if you want support for using XPU UCC. XPU UCC + is helpfer for XPU schedule. The full name of UCC is + Universal Converged Computing. + + +config XPU_VSTREAM + bool "virtual submit queue and complete queue" + default n + depends on XPU_UCC + help + virtual Submit Queue and Complete Queue support for XPU. + It is used to help XPU schedule. diff --git a/kernel/ucc/Makefile b/kernel/ucc/Makefile new file mode 100644 index 000000000000..0e2735d2aef4 --- /dev/null +++ b/kernel/ucc/Makefile @@ -0,0 +1 @@ +obj-y += ascend_vstream.o vstream.o diff --git a/kernel/ucc/ascend_vstream.c b/kernel/ucc/ascend_vstream.c new file mode 100644 index 000000000000..d248aaff7639 --- /dev/null +++ b/kernel/ucc/ascend_vstream.c @@ -0,0 +1,654 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/uaccess.h> +#include <linux/syscalls.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/vstream.h> +#include <linux/slab.h> +#include <linux/list.h> +#include <linux/ucc_common.h> +#include <linux/ucc_sched.h> + +DEFINE_MUTEX(vstreamId_Bitmap_mutex); +static DECLARE_BITMAP(vstreamIdBitmap, DEVDRV_MAX_SQ_NUM); + +static DEFINE_MUTEX(vcqId_Bitmap_mutex); +static DECLARE_BITMAP(vcqIdBitmap, DEVDRV_MAX_CQ_NUM); + +static DEFINE_MUTEX(revmap_mutex); + +static struct vstream_info *vstreamContainer[DEVDRV_MAX_SQ_NUM]; +static struct vcq_map_table *vsqcqMapTable[DEVDRV_MAX_CQ_NUM]; + +#define MAX_SQ_SIZE (MAX_VSTREAM_SIZE * MAX_VSTREAM_SLOT_SIZE) +#define MAX_CQ_SIZE (MAX_VSTREAM_SIZE * MAX_CQ_SLOT_SIZE) + +#define SQ_USER_ADDR_OFFSET(id) ((unsigned long)REMAP_ALIGN(MAX_SQ_SIZE) * id) +#define CQ_USER_ADDR_OFFSET(id) ((unsigned long)REMAP_ALIGN(MAX_CQ_SIZE) * id) + +#define SQ_VSTREAM_DATA(id) vstreamContainer[id]->vsqNode->vstreamData +#define CQ_VSTREAM_DATA(id) vstreamContainer[id]->vcqNode->vstreamData + +static struct tsdrv_ctx *get_ctx(int fd) +{ + struct fd f; + struct davinci_intf_private_stru *file_private_data; + struct tsdrv_ctx *ctx = NULL; + + f = fdget(fd); + if (!f.file) + goto out; + + file_private_data = f.file->private_data; + if (!file_private_data) + goto out; + + ctx = file_private_data->priv_filep.private_data; + +out: + fdput(f); + return ctx; +} + +static struct vcq_map_table *vstream_get_map_table(uint32_t id) +{ + return vsqcqMapTable[id]; +} + +static void free_vstreamId(uint32_t vstreamId) +{ + mutex_lock(&vstreamId_Bitmap_mutex); + clear_bit(vstreamId, vstreamIdBitmap); + mutex_unlock(&vstreamId_Bitmap_mutex); +} + +static void free_vcqId(uint32_t vcqId, uint32_t flag) +{ + mutex_lock(&vcqId_Bitmap_mutex); + if (!(flag & TSDRV_CQ_REUSE)) + clear_bit(vcqId, vcqIdBitmap); + mutex_unlock(&vcqId_Bitmap_mutex); +} + +static void vstream_free_map_table(uint32_t vcqId, uint32_t vstreamId, + uint32_t flag) +{ + struct vcq_map_table *freeTable = NULL; + struct vstream_id *vstreamIdNode = NULL; + + freeTable = vstream_get_map_table(vcqId); + if (!freeTable) { + ucc_err("No map found for vcq:%d.\n", vcqId); + return; + } + + list_for_each_entry(vstreamIdNode, &freeTable->vstreamId_list, list) { + if (vstreamIdNode->vstreamId == vstreamId) { + list_del(&vstreamIdNode->list); + kfree(vstreamIdNode); + break; + } + } + if (!(flag & TSDRV_CQ_REUSE)) { + kfree(freeTable->vcqNode->vstreamData); + kfree(freeTable->vcqNode); + kfree(freeTable); + } +} + +static void vstream_alloc_ucc_se(struct ucc_se *se) +{ + memset(&se->statistics, 0, sizeof(se->statistics)); + se->on_cu = 0; + se->state = SE_PREPARE; + se->flag = UCC_TIF_NONE; + se->prio = UCC_PRIO_HIGH; + se->step = UCC_STEP_SLOW; + raw_spin_lock_init(&se->se_lock); +} + +static struct vstream_info *vstream_create_info(struct tsdrv_ctx *ctx, + struct normal_alloc_sqcq_para *para) +{ + struct vcq_map_table *mapTable = NULL; + + struct vstream_info *vstream = kzalloc(sizeof(struct vstream_info), + GFP_KERNEL); + if (!vstream) + return NULL; + + (void)memcpy(vstream->info, para->info, + sizeof(uint32_t) * SQCQ_RTS_INFO_LENGTH); + + vstream->privdata = ctx; + vstream->tsId = para->tsId; + vstream->vstreamId = para->sqId; + vstream->vcqId = para->cqId; + + mapTable = vstream_get_map_table(vstream->vcqId); + if (!mapTable || !mapTable->vcqNode) { + ucc_err("No map found for vcqId:%d.\n", vstream->vcqId); + goto free_vstream; + } + vstream->vcqNode = mapTable->vcqNode; + vstream->vsqNode = kmalloc(sizeof(struct vstream_node), GFP_KERNEL); + if (!vstream->vsqNode) { + ucc_err("Failed to alloc memory for vsqNode:%d.\n", + vstream->vstreamId); + goto free_vstream; + } + vstream->vsqNode->vstreamData = kmalloc(MAX_SQ_SIZE, GFP_KERNEL); + if (!vstream->vsqNode->vstreamData) + goto free_vsqNode; + vstream->vsqNode->id = vstream->vstreamId; + vstream->vsqNode->head = 0; + vstream->vsqNode->tail = 0; + vstream->vsqNode->credit = MAX_VSTREAM_SIZE; + raw_spin_lock_init(&vstream->vsqNode->spin_lock); + vstream->send_cnt = 0; + vstream->p = current; + vstream_alloc_ucc_se(&vstream->se); + + return vstream; + +free_vsqNode: + kfree(vstream->vsqNode); + +free_vstream: + kfree(vstream); + return NULL; +} + +struct vstream_info *vstream_get_info(uint32_t id) +{ + return vstreamContainer[id]; +} + +static void vstream_free_info(uint32_t id) +{ + struct vstream_info *freeInfo = vstream_get_info(id); + + ucc_set_vstream_state(freeInfo, SE_DEAD); + + if (freeInfo) { + if (freeInfo->vsqNode) + kfree(freeInfo->vsqNode->vstreamData); + + kfree(freeInfo->vsqNode); + } + + kfree(freeInfo); +} + +static int queue_pop_by_num(struct vstream_node *node, uint32_t pop_num) +{ + if (node->credit + pop_num > MAX_VSTREAM_SIZE) { + ucc_err("Queue usage out-of-bounds"); + return -EACCES; + } + + node->credit += pop_num; + node->head = (node->head + pop_num) % MAX_VSTREAM_SIZE; + return 0; +} + +static int queue_pop_by_head(struct vstream_node *node, uint32_t head) +{ + int pop_num = (head - node->head + MAX_VSTREAM_SIZE) % + MAX_VSTREAM_SIZE; + return queue_pop_by_num(node, pop_num); +} + +int update_vstream_head(struct vstream_info *vstream_info, int num) +{ + struct vstream_node *node = vstream_info->vsqNode; + + raw_spin_lock(&node->spin_lock); + if (node->credit + num > MAX_VSTREAM_SIZE) { + raw_spin_unlock(&node->spin_lock); + return -1; + } + + node->credit += num; + node->head = (node->head + num) % MAX_VSTREAM_SIZE; + raw_spin_unlock(&node->spin_lock); + + return 0; +} + +bool vstream_have_kernel(struct ucc_se *se) +{ + struct vstream_info *vinfo; + + vinfo = container_of(se, struct vstream_info, se); + return vinfo->vsqNode->credit != MAX_VSTREAM_SIZE; +} + +static int queue_push_by_num(struct vstream_node *node, uint32_t push_num) +{ + if (node->credit - push_num < 0) + return -EACCES; + + node->credit -= push_num; + node->tail = (node->tail + push_num) % MAX_VSTREAM_SIZE; + return 0; +} + +static int queue_push_by_tail(struct vstream_node *node, uint32_t tail) +{ + int push_num = (tail - node->tail + MAX_VSTREAM_SIZE) % + MAX_VSTREAM_SIZE; + return queue_push_by_num(node, push_num); +} + +static uint32_t vstream_alloc_vstreamId(void) +{ + uint32_t vstreamId = DEVDRV_MAX_SQ_NUM; + + /* alloc vstreamId */ + mutex_lock(&vstreamId_Bitmap_mutex); + vstreamId = find_first_zero_bit(vstreamIdBitmap, DEVDRV_MAX_SQ_NUM); + if (vstreamId == DEVDRV_MAX_SQ_NUM) { + ucc_err("vstreamId exhausted.\n"); + mutex_unlock(&vstreamId_Bitmap_mutex); + return DEVDRV_MAX_SQ_NUM; + } + set_bit(vstreamId, vstreamIdBitmap); + mutex_unlock(&vstreamId_Bitmap_mutex); + + return vstreamId; +} + +static uint32_t vstream_alloc_vcqid(void) +{ + uint32_t vcqId = DEVDRV_MAX_CQ_NUM; + + /* alloc vcqid */ + mutex_lock(&vcqId_Bitmap_mutex); + vcqId = find_first_zero_bit(vcqIdBitmap, DEVDRV_MAX_CQ_NUM); + if (vcqId == DEVDRV_MAX_CQ_NUM) { + ucc_err("vcqId has been used up.\n"); + mutex_unlock(&vcqId_Bitmap_mutex); + return DEVDRV_MAX_CQ_NUM; + } + set_bit(vcqId, vcqIdBitmap); + mutex_unlock(&vcqId_Bitmap_mutex); + + ucc_info("vcqId = %d\n", vcqId); + return vcqId; +} + +int vstream_map_pfnaddr(struct tsdrv_ctx *ctx, + struct normal_alloc_sqcq_para *para) +{ + int err = 0; + unsigned long vsqAddr; + unsigned long vcqAddr; + pgprot_t vm_page_prot; + struct vm_area_struct *vma = ctx->vma[para->tsId]; + + vsqAddr = vma->vm_start + SQ_USER_ADDR_OFFSET(para->sqId); + vm_page_prot = pgprot_device(vma->vm_page_prot); + err = remap_pfn_range(vma, vsqAddr, + virt_to_pfn(SQ_VSTREAM_DATA(para->sqId)), + MAX_SQ_SIZE, vm_page_prot); + if (err) { + ucc_err("remap_pfn_range failed,ret=%d.\n", err); + return -EFAULT; + } + if (!(para->flag & TSDRV_CQ_REUSE)) { + vcqAddr = vma->vm_start + DEVDRV_VM_CQ_MEM_OFFSET + + CQ_USER_ADDR_OFFSET(para->cqId); + err = remap_pfn_range(vma, vcqAddr, + virt_to_pfn(CQ_VSTREAM_DATA(para->sqId)), + MAX_CQ_SIZE, vm_page_prot); + if (err) { + ucc_err("remap_pfn_range failed,ret=%d.\n", err); + return -EFAULT; + } + } + + return err; +} + +void vstream_unmap_pfnaddr(struct tsdrv_ctx *ctx, + struct normal_free_sqcq_para *para) +{ + unsigned long vsqAddr; + unsigned long vcqAddr; + size_t cqSize = PAGE_ALIGN(MAX_CQ_SIZE); + struct vm_area_struct *vma = ctx->vma[para->tsId]; + + vsqAddr = vma->vm_start + SQ_USER_ADDR_OFFSET(para->sqId); + zap_vma_ptes(vma, vsqAddr, MAX_SQ_SIZE); + + if (!(para->flag & TSDRV_CQ_REUSE)) { + vcqAddr = vma->vm_start + DEVDRV_VM_CQ_MEM_OFFSET + + CQ_USER_ADDR_OFFSET(para->cqId); + zap_vma_ptes(vma, vcqAddr, cqSize); + } +} + +static int vstream_update_vcqtable(uint32_t vcqId, uint32_t vstreamId, + uint32_t flag) +{ + int err = -ENOSPC; + struct vcq_map_table *vcqTable = NULL; + struct vstream_id *vstreamIdNode = NULL; + + if (!(flag & TSDRV_CQ_REUSE)) { + vcqTable = kmalloc(sizeof(struct vcq_map_table), GFP_KERNEL); + if (!vcqTable) + return -ENOMEM; + + vcqTable->vcqId = vcqId; + vcqTable->vcqNode = kmalloc(sizeof(struct vstream_node), + GFP_KERNEL); + if (!vcqTable->vcqNode) { + err = -ENOMEM; + goto free_vcqTable; + } + + vcqTable->vcqNode->vstreamData = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (!vcqTable->vcqNode->vstreamData) { + err = -ENOMEM; + goto free_vcqNode; + } + vcqTable->vcqNode->id = vcqId; + vcqTable->vcqNode->head = 0; + vcqTable->vcqNode->tail = 0; + vcqTable->vcqNode->credit = MAX_VSTREAM_SIZE; + INIT_LIST_HEAD(&vcqTable->vstreamId_list); + vsqcqMapTable[vcqId] = vcqTable; + } else { + vcqTable = vsqcqMapTable[vcqId]; + } + vstreamIdNode = kmalloc(sizeof(struct vstream_id), GFP_KERNEL); + if (!vstreamIdNode) { + err = -ENOMEM; + + if (!(flag & TSDRV_CQ_REUSE)) + goto free_vstreamData; + return err; + } + vstreamIdNode->vstreamId = vstreamId; + list_add(&vstreamIdNode->list, &vcqTable->vstreamId_list); + + return 0; + +free_vstreamData: + kfree(vcqTable->vcqNode->vstreamData); + +free_vcqNode: + kfree(vcqTable->vcqNode); + +free_vcqTable: + kfree(vcqTable); + + return err; +} + +int ascend_vstream_alloc(struct vstream_args *arg) +{ + uint32_t vstreamId; + uint32_t vcqId = DEVDRV_MAX_CQ_NUM; + int err = -EINVAL; + struct vstream_info *vstream = NULL; + struct tsdrv_ctx *ctx = NULL; + struct normal_alloc_sqcq_para *sqcq_alloc_para = &arg->va_args.ascend; + + ctx = get_ctx(sqcq_alloc_para->fd); + if (!ctx) + return err; + + vstreamId = vstream_alloc_vstreamId(); + if (vstreamId == DEVDRV_MAX_SQ_NUM) { + ucc_err("vstreamId alloc failed.\n"); + return err; + } + if (!(sqcq_alloc_para->flag & TSDRV_CQ_REUSE)) + vcqId = vstream_alloc_vcqid(); + else + vcqId = sqcq_alloc_para->cqId; + + if (vcqId >= DEVDRV_MAX_CQ_NUM) { + ucc_err("vcqId alloc failed.\n"); + goto free_vstreamIds; + } + err = vstream_update_vcqtable(vcqId, vstreamId, sqcq_alloc_para->flag); + if (err) { + ucc_err("vcqtable update failed, vcqId:%d, vstreamId:%d, flag:%d.\n", + vcqId, vstreamId, sqcq_alloc_para->flag); + goto free_vcqid; + } + + sqcq_alloc_para->sqId = vstreamId; + sqcq_alloc_para->cqId = vcqId; + vstream = vstream_create_info(ctx, sqcq_alloc_para); + if (!vstream) { + ucc_err("vstream create failed: vcqId:%d, vstreamId:%d.\n", + vcqId, vstreamId); + err = -ENOSPC; + goto free_vcqtable; + } + + vstream->devId = sqcq_alloc_para->devId; + vstreamContainer[vstreamId] = vstream; + + vstream->group = select_sq(vstream); + if (!vstream->group) { + ucc_err("Failed to select sq\n"); + err = -EINVAL; + goto free_vstream_info; + } + + err = vstream_map_pfnaddr(ctx, sqcq_alloc_para); + if (err) { + ucc_err("vstream map failed, ret=%d.\n", err); + goto free_vstream_info; + } + return 0; + +free_vstream_info: + vstream_free_info(vstreamId); + +free_vcqtable: + vstream_free_map_table(vcqId, vstreamId, sqcq_alloc_para->flag); + +free_vcqid: + free_vcqId(vcqId, sqcq_alloc_para->flag); + +free_vstreamIds: + free_vstreamId(vstreamId); + + return err; +} + +int ascend_vstream_free(struct vstream_args *arg) +{ + int err = 0; + struct vstream_info *vstreamInfo = NULL; + struct normal_free_sqcq_para *sqcq_free_para = &arg->vf_args.ascend; + uint32_t vstreamId = sqcq_free_para->sqId; + uint32_t vcqId = sqcq_free_para->cqId; + + if (vstreamId >= DEVDRV_MAX_SQ_NUM || vcqId >= DEVDRV_MAX_CQ_NUM) { + ucc_err("vstream index out-of-range, vstreamId=%d, vcqId=%d.\n", + vstreamId, vcqId); + return -EPERM; + } + + vstreamInfo = vstream_get_info(vstreamId); + if (!vstreamInfo) { + ucc_err("vstreamInfo get failed, vstreamId=%d.\n", vstreamId); + return -EPERM; + } + err = ucc_free_task(vstreamInfo, vstreamInfo->privdata); + + free_vcqId(vcqId, sqcq_free_para->flag); + vstream_free_map_table(vcqId, vstreamId, sqcq_free_para->flag); + + vstream_unmap_pfnaddr(vstreamInfo->privdata, sqcq_free_para); + + vstream_free_info(vstreamId); + free_vstreamId(vstreamId); + return err; +} + +int ascend_vstream_kick(struct vstream_args *arg) +{ + int err = 0; + struct tsdrv_sqcq_data_para *sqcq_data_para = &arg->vk_args.ascend; + int vstreamId = sqcq_data_para->id; + int tail = sqcq_data_para->val; + struct vstream_info *vstreamInfo = NULL; + int push_num; + + vstreamInfo = vstream_get_info(vstreamId); + vstreamInfo->p = current; + + if (!vstreamInfo) { + ucc_err("vstreamInfo get failed, vstreamId=%d.\n", vstreamId); + return -ENOMEM; + } + + push_num = (tail - vstreamInfo->vsqNode->tail + MAX_VSTREAM_SIZE) % + MAX_VSTREAM_SIZE; + + raw_spin_lock(&vstreamInfo->vsqNode->spin_lock); + err = queue_push_by_tail(vstreamInfo->vsqNode, tail); + if (err) { + raw_spin_unlock(&vstreamInfo->vsqNode->spin_lock); + ucc_err("queue_push_by_tail error, ret = %d\n", err); + return err; + } + raw_spin_unlock(&vstreamInfo->vsqNode->spin_lock); + + err = ucc_wake_up(&vstreamInfo->se); + return err; +} + +int ascend_callback_vstream_wait(struct vstream_args *arg) +{ + int err = 0; + int cqeNum = 0; + int cqeSum = 0; + struct vstream_info *vstreamInfo = NULL; + struct vcq_map_table *vcqTable = NULL; + struct vcq_map_table *waitTable = NULL; + struct vstream_id *vstreamIdNode = NULL; + struct devdrv_report_para *report_para = &arg->cvw_args; + uint32_t *sqlist; + uint32_t sqlist_num = 0; + uint32_t vstreamId, vcqId; + + sqlist = kmalloc_array(DEVDRV_MAX_SQ_NUM, sizeof(uint32_t), GFP_KERNEL); + if (!sqlist) + return -ENOMEM; + + vcqId = report_para->cq_id; + if (vcqId >= DEVDRV_MAX_CQ_NUM) { + ucc_err("vcqId out-of-range, vcqId=%d.\n", vcqId); + err = -EPERM; + goto out; + } + + mutex_lock(&vcqId_Bitmap_mutex); + waitTable = vstream_get_map_table(vcqId); + if (!waitTable) { + ucc_err("No map found for vcq:%d.\n", vcqId); + mutex_unlock(&vcqId_Bitmap_mutex); + err = -EPERM; + goto out; + } + + list_for_each_entry(vstreamIdNode, &waitTable->vstreamId_list, list) + sqlist[sqlist_num++] = vstreamIdNode->vstreamId; + mutex_unlock(&vcqId_Bitmap_mutex); + + //get sqInfo from hardware + for (vstreamId = 0; vstreamId < sqlist_num; vstreamId++) { + vstreamInfo = vstream_get_info(sqlist[vstreamId]); + if (!vstreamInfo) + continue; + err |= ucc_wait_cq(vstreamInfo, vstreamInfo->privdata, + report_para, &cqeNum); + cqeSum += cqeNum; + if (cqeNum) + break; + } + + //update cqInfo + mutex_lock(&vcqId_Bitmap_mutex); + vcqTable = vstream_get_map_table(vcqId); + if (!vcqTable) { + ucc_err("No map found for vcq:%d.\n", vcqId); + err = -EPERM; + goto out; + } + + err = queue_push_by_num(vcqTable->vcqNode, cqeSum); + if (err) { + mutex_unlock(&vcqId_Bitmap_mutex); + ucc_err("failed to queue_push_by_num, ret = %d.\n", err); + goto out; + } + report_para->cq_tail = vcqTable->vcqNode->tail; + mutex_unlock(&vcqId_Bitmap_mutex); + +out: + kfree(sqlist); + return err; +} + +int ascend_callback_vstream_kick(struct vstream_args *arg) +{ + u32 vcqId, release_head; + struct vstream_info *vstreamInfo = NULL; + int err = 0; + + vcqId = arg->cvk_args.id; + release_head = arg->cvk_args.val; + if (vcqId >= DEVDRV_MAX_CQ_NUM || release_head >= MAX_VSTREAM_SIZE) { + ucc_err("vstream index out-of-range, vcqId=%d, release_head=%d.\n", + vcqId, release_head); + return -EPERM; + } + + mutex_lock(&vcqId_Bitmap_mutex); + vstreamInfo = vstream_get_info(vcqId); + if (!vstreamInfo) { + err = -EPERM; + goto out; + } + + err = queue_pop_by_head(vstreamInfo->vcqNode, release_head); + +out: + mutex_unlock(&vcqId_Bitmap_mutex); + return err; +} + +int ascend_vstream_get_head(struct vstream_args *arg) +{ + u32 vstreamId = arg->vh_args.id; + struct vstream_info *vstreamInfo = NULL; + + if (vstreamId >= DEVDRV_MAX_SQ_NUM) { + ucc_err("vstreamId out-of-range, vstreamId=%d.\n", vstreamId); + return -EINVAL; + } + + vstreamInfo = vstream_get_info(vstreamId); + if (!vstreamInfo) { + ucc_err("vstreamInfo get failed, vstreamId=%d.\n", vstreamId); + return -EINVAL; + } + arg->vh_args.val = vstreamInfo->vsqNode->head; + + return 0; +} + diff --git a/kernel/ucc/ascend_vstream.h b/kernel/ucc/ascend_vstream.h new file mode 100644 index 000000000000..0cd200168495 --- /dev/null +++ b/kernel/ucc/ascend_vstream.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#ifndef _ASCEND_VSTREAM_H +#define _ASCEND_VSTREAM_H + +int ascend_vstream_alloc(struct vstream_args *arg); +int ascend_vstream_free(struct vstream_args *arg); +int ascend_vstream_kick(struct vstream_args *arg); +int ascend_callback_vstream_wait(struct vstream_args *arg); +int ascend_callback_vstream_kick(struct vstream_args *arg); +int ascend_vstream_get_head(struct vstream_args *arg); + +#endif /* _ASCEND_VSTREAM_H */ diff --git a/kernel/ucc/vstream.c b/kernel/ucc/vstream.c new file mode 100644 index 000000000000..d4705f285b89 --- /dev/null +++ b/kernel/ucc/vstream.c @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/syscalls.h> +#include <linux/vstream.h> + +#include "ascend_vstream.h" + +static int amdgpu_vstream_alloc(struct vstream_args *arg) +{ + return 0; +} +static int amdgpu_vstream_free(struct vstream_args *arg) +{ + return 0; +} +static int amdgpu_vstream_kick(struct vstream_args *arg) +{ + return 0; +} +static int amdgpu_vstream_update(struct vstream_args *arg) +{ + return 0; +} + +/* + * vstream_manage_cmd table + */ +static vstream_manage_t (*vstream_command_table[AMDGPU_MAX_COMMAND + 1]) = { + ascend_vstream_alloc, // ASCEND_VSTREAM_ALLOC + ascend_vstream_free, // ASCEND_VSTREAM_FREE + ascend_vstream_kick, // ASCEND_VSTREAM_KICK + ascend_callback_vstream_wait, // ASCEND_CALLBACK_VSTREAM_WAIT + ascend_callback_vstream_kick, // ASCEND_CALLBACK_VSTREAM_KICK + ascend_vstream_get_head, // ASCEND_VSTREAM_GET_HEAD + NULL, // ASCEND_MAX_COMMAND + amdgpu_vstream_alloc, // AMDGPU_VSTREAM_ALLOC + amdgpu_vstream_free, // AMDGPU_VSTREAM_FREE + amdgpu_vstream_kick, // AMDGPU_VSTREAM_KICK + amdgpu_vstream_update, // AMDGPU_VSTREAM_UPDATE + NULL // AMDGPU_MAX_COMMAND +}; + +SYSCALL_DEFINE2(vstream_manage, struct vstream_args __user *, arg, int, cmd) +{ + int res = 0; + struct vstream_args vstream_arg; + + if (cmd > AMDGPU_MAX_COMMAND) + return -EINVAL; + + if (copy_from_user(&vstream_arg, arg, sizeof(struct vstream_args))) { + pr_err("copy_from_user failed\n"); + return -EFAULT; + } + res = vstream_command_table[cmd](&vstream_arg); + if (copy_to_user(arg, &vstream_arg, sizeof(struct vstream_args))) { + pr_err("copy_to_user failed\n"); + return -EFAULT; + } + + return res; +} diff --git a/kernel/ucc_sched/Makefile b/kernel/ucc_sched/Makefile new file mode 100644 index 000000000000..4a41f07d091c --- /dev/null +++ b/kernel/ucc_sched/Makefile @@ -0,0 +1 @@ +obj-(CONFIG_XPU_SCHEDULE) += core.o diff --git a/kernel/ucc_sched/core.c b/kernel/ucc_sched/core.c new file mode 100644 index 000000000000..4c7f1f59aeb9 --- /dev/null +++ b/kernel/ucc_sched/core.c @@ -0,0 +1,591 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved. + * Author: Huawei OS Kernel Lab + * Create: Tue Jan 17 22:19:17 2023 + */ + +#include <uapi/linux/sched/types.h> +#include <linux/kthread.h> +#include <linux/slab.h> +#include <linux/ucc_sched.h> + +#include "ucc_sched.h" +#include "../sched/sched.h" +#define CREATE_TRACE_POINTS +#include <trace/events/ucc_sched.h> + +#define MAX_XCU_NUM (100) +#define TS_SQ_TRANS_TASK_THRESHOLD (20) + +static struct xcu xcu_manager[MAX_XCU_NUM]; +static int num_active_xcu; +raw_spinlock_t xcu_mgr_lock; +int sysctl_ucc_sched_rcv_timeout_ms = 10; + +static struct task_struct vstream_idle_task; +static struct vstream_info vstream_idle = { + .vstreamId = UINT_MAX, + .p = &vstream_idle_task, +}; + +struct sched_args { + int cu_id; +}; + +static inline int is_xcu_offline(struct xcu *cu) +{ + return cu->state == XCU_INACTIVE; +} + +void ucc_set_vstream_state(struct vstream_info *vinfo, int state) +{ + vinfo->se.state = state; +} + +static inline int should_se_run(struct ucc_se *se) +{ + return se->state != SE_BLOCK && se->state != SE_DEAD; +} + +static inline void update_stats_run_start(struct xcu *cu, + struct ucc_se *se) +{ + u64 start; + + if (!schedstat_enabled()) + return; + + start = ktime_get_boot_ns(); + __schedstat_set(se->statistics.run_start, start); +} + +static inline void update_stats_run_end(struct xcu *cu, + struct ucc_se *se) +{ + + struct vstream_info *vinfo; + u64 delta; + + if (!schedstat_enabled()) + return; + + delta = ktime_get_boot_ns() - schedstat_val(se->statistics.run_start); + vinfo = container_of(se, struct vstream_info, se); + trace_ucc_sched_stat_run(vinfo, delta, se->is_timeout); + + __schedstat_set(se->statistics.run_max, + max(schedstat_val(se->statistics.run_max), delta)); + __schedstat_inc(se->statistics.run_count); + __schedstat_add(se->statistics.run_sum, delta); + __schedstat_set(se->statistics.run_start, 0); +} + +static inline void update_stats_preempt_start(struct xcu *cu, + struct ucc_se *se) +{ + u64 wait_start; + + if (!schedstat_enabled()) + return; + + wait_start = ktime_get_boot_ns(); + __schedstat_set(se->statistics.preempt_start, wait_start); +} + +static inline void update_stats_wait_start(struct xcu *cu, struct ucc_se *se) +{ + u64 wait_start; + + if (!schedstat_enabled()) + return; + + wait_start = ktime_get_boot_ns(); + __schedstat_set(se->statistics.wait_start, wait_start); +} + + +static inline void update_stats_wait_end(struct xcu *cu, struct ucc_se *se) +{ + struct vstream_info *vinfo; + u64 delta, preempt_delta; + + if (!schedstat_enabled()) + return; + + delta = ktime_get_boot_ns() - schedstat_val(se->statistics.wait_start); + vinfo = container_of(se, struct vstream_info, se); + trace_ucc_sched_stat_wait(vinfo, delta); + + __schedstat_set(se->statistics.wait_max, + max(schedstat_val(se->statistics.wait_max), delta)); + __schedstat_inc(se->statistics.wait_count); + __schedstat_add(se->statistics.wait_sum, delta); + __schedstat_set(se->statistics.wait_start, 0); + + if (se->statistics.preempt_start) { + preempt_delta = ktime_get_boot_ns() - + schedstat_val(se->statistics.preempt_start); + trace_ucc_sched_stat_preempt(vinfo, preempt_delta); + + __schedstat_set(se->statistics.preempt_max, + max(schedstat_val(se->statistics.preempt_max), + preempt_delta)); + __schedstat_inc(se->statistics.preempt_count); + __schedstat_add(se->statistics.preempt_sum, preempt_delta); + __schedstat_set(se->statistics.preempt_start, 0); + } +} + +void ucc_dump_statistics_info(struct ucc_se *se) +{ + struct vstream_info *vinfo = container_of(se, struct vstream_info, se); + + pr_info("comm %s pid %d vstreamId %d kernel_sum %llu wait_count %llu wait_max %llu[ns] wait_sum %llu[ns] preempt_count %llu preempt_max %llu[ns] preempt_sum %llu[ns]\n", + vinfo->p->comm, + vinfo->p->pid, + vinfo->vstreamId, + vinfo->se.statistics.kernel_sum, + vinfo->se.statistics.wait_count, + vinfo->se.statistics.wait_max, + vinfo->se.statistics.wait_sum, + vinfo->se.statistics.preempt_count, + vinfo->se.statistics.preempt_max, + vinfo->se.statistics.preempt_sum); +} + +static void put_prev_entity(struct xcu *cu, struct ucc_se *prev) +{ + if (!prev) + return; + + if (prev->on_cu) + update_stats_wait_start(cu, prev); + + prev->state = SE_READY; + cu->curr_se->state = SE_RUNNING; +} + +static void set_next_entity(struct xcu *cu, struct ucc_se *se) +{ + if (se->on_cu && se != cu->curr_se) + update_stats_wait_end(cu, se); + + cu->curr_se = se; +} + +static void dequeue_ucc_se(struct ucc_se *se, struct xcu *cu) +{ + raw_spin_lock(&cu->xcu_lock); + if (!se->on_cu) { + raw_spin_unlock(&cu->xcu_lock); + return; + } + + se->on_cu = 0; + + list_del_init(&se->run_list); + + if (list_empty(cu->queue + se->prio)) + __clear_bit(se->prio, cu->bitmap); + cu->rt_nr_running--; + + if (se != cu->curr_se) + update_stats_wait_end(cu, se); + + if (cu->curr_se == se) + cu->curr_se = NULL; + + raw_spin_unlock(&cu->xcu_lock); +} + +static void enqueue_ucc_se(struct ucc_se *se, struct xcu *cu) +{ + struct list_head *queue = cu->queue + se->prio; + + raw_spin_lock(&cu->xcu_lock); + if (se->on_cu) { + raw_spin_unlock(&cu->xcu_lock); + return; + } + se->on_cu = 1; + se->is_timeout = 0; + list_add_tail(&se->run_list, queue); + __set_bit(se->prio, cu->bitmap); + cu->rt_nr_running++; + + update_stats_wait_start(cu, se); + + raw_spin_unlock(&cu->xcu_lock); +} + +static struct xcu *ucc_select_cu(struct ucc_se *se) +{ + struct vstream_info *vstream_info; + int min_nr_running = INT_MAX; + struct xcu *cu; + int select_cu = 0; + int cu_id; + + vstream_info = container_of(se, struct vstream_info, se); + for (cu_id = 0; cu_id < num_active_xcu; cu_id++) { + cu = &xcu_manager[cu_id]; + + if (vstream_info->devId != cu->dev_id || + vstream_info->tsId != cu->ts_id) + continue; + + if (cu->rt_nr_running < min_nr_running) { + min_nr_running = cu->rt_nr_running; + select_cu = cu_id; + } + } + + vstream_info->cu_id = select_cu; + return &xcu_manager[select_cu]; +} + +static int ucc_check_preempt(struct ucc_se *se, struct xcu *cu) +{ + struct vstream_info *vinfo_curr, *vinfo; + struct ucc_se *curr_se; + + curr_se = cu->curr_se; + if (!curr_se) + return 1; + + vinfo = container_of(se, struct vstream_info, se); + vinfo_curr = container_of(curr_se, struct vstream_info, se); + if (vinfo_curr->p->ucc_priority > vinfo->p->ucc_priority) { + update_stats_preempt_start(cu, se); + curr_se->flag = UCC_TIF_PREEMPT; + return 1; + } + + return 0; +} + +static inline void ucc_wakeup_idle_worker(struct xcu *cu) +{ + wake_up_state(cu->worker, TASK_INTERRUPTIBLE); +} + +static inline void ucc_wakeup_running_worker(struct xcu *cu) +{ + wake_up_state(cu->worker, TASK_UNINTERRUPTIBLE); +} + +int ucc_schedule(int cu_id) +{ + struct xcu *cu; + + cu = &xcu_manager[cu_id]; + cu->is_wake = 1; + ucc_wakeup_running_worker(cu); + + return 0; +} +EXPORT_SYMBOL(ucc_schedule); + +int ucc_wake_up(struct ucc_se *se) +{ + struct xcu *cu; + + raw_spin_lock(&se->se_lock); + if (se->on_cu) { + raw_spin_unlock(&se->se_lock); + return 0; + } + + if (se->state == SE_BLOCK) + se->state = SE_READY; + + cu = ucc_select_cu(se); + if (!cu) { + raw_spin_unlock(&se->se_lock); + return -1; + } + + enqueue_ucc_se(se, cu); + if (ucc_check_preempt(se, cu)) + ucc_wakeup_idle_worker(cu); + + raw_spin_unlock(&se->se_lock); + + return 0; +} + +static struct ucc_se *pick_next_ucc_se(struct xcu *cu) +{ + struct ucc_se *se; + struct list_head *queue; + int idx; + + if (!cu->rt_nr_running) + return NULL; + + idx = sched_find_first_bit(cu->bitmap); + BUG_ON(idx >= MAX_UCC_PRIO); + + queue = cu->queue + idx; + se = list_entry(queue->next, struct ucc_se, run_list); + + return se; +} + +static int ucc_submit_kernel(struct xcu *cu, struct ucc_se *se) +{ + struct vstream_info *vstream_info; + struct xpu_group *group; + struct tsdrv_ctx *ctx; + int kernel_num, left; + + vstream_info = container_of(se, struct vstream_info, se); + ctx = vstream_info->privdata; + left = (vstream_info->vsqNode->tail - vstream_info->vsqNode->head + + MAX_VSTREAM_SIZE) % MAX_VSTREAM_SIZE; + + group = vstream_info->group; + + kernel_num = xpu_run(group, vstream_info, ctx); + if (kernel_num <= 0) + return kernel_num; + + //update vstream info head and tail; + update_vstream_head(vstream_info, kernel_num); + + left -= kernel_num; + + return kernel_num; +} + +static inline void ucc_wait_idle(struct xcu *cu) +{ + cu->state = XCU_IDLE; + + do { + schedule_timeout_interruptible(1); + } while (cu->rt_nr_running == 0); + + cu->state = XCU_BUSY; +} + +static inline void ucc_wait_running(struct xcu *cu, struct ucc_se *se) +{ + int cnt = 1; + + do { + schedule_timeout_uninterruptible( + msecs_to_jiffies(sysctl_ucc_sched_rcv_timeout_ms)); + } while (cu->is_wake == 0 && --cnt > 0); + + if (cnt == 0) { + __schedstat_inc(se->statistics.timeout_count); + se->is_timeout = 1; + } +} + +static inline void clear_se_flag(struct ucc_se *se) +{ + if (se) + se->flag = UCC_TIF_NONE; +} + +void ucc_dequeue_task(struct vstream_info *vInfo) +{ + struct xcu *cu = &xcu_manager[vInfo->cu_id]; + struct ucc_se *se = &vInfo->se; + + raw_spin_lock(&se->se_lock); + dequeue_ucc_se(se, cu); + raw_spin_unlock(&se->se_lock); +} + +/* + * dynamic padding: select kernels with no QoS confilcts to current ucc_se + * to fill cu; + */ +static void dynamic_padding(struct xcu *cu, struct ucc_se *se) +{ +} + +static int __ucc_schedule(void *args) +{ + struct sched_args *sargs = (struct sched_args *)args; + int cu_id = sargs->cu_id; + struct xcu *cu = &xcu_manager[cu_id]; + struct ucc_se *se = NULL, *curr_se = NULL; + struct ucc_se *prev_se = NULL; + struct vstream_info *vinfo; + int send_cnt = 0; + int kernel_num, preempt; + + while (!is_xcu_offline(cu)) { + raw_spin_lock(&cu->xcu_lock); + cu->is_sched = 0; + prev_se = cu->curr_se; + + preempt = 0; + if (prev_se) { + if (prev_se->flag != UCC_TIF_PREEMPT) + goto submit_kernel; + + vinfo = container_of(prev_se, struct vstream_info, se); + if (send_cnt < vinfo->p->ucc_step) + goto submit_kernel; + + preempt = 1; + } + + clear_se_flag(prev_se); + se = pick_next_ucc_se(cu); + if (!se) { + cu->is_sched = 1; + raw_spin_unlock(&cu->xcu_lock); + trace_ucc_sched_switch(0, &vstream_idle); + ucc_wait_idle(cu); + continue; + } + + set_next_entity(cu, se); + if (se != prev_se) { + put_prev_entity(cu, prev_se); + vinfo = container_of(se, struct vstream_info, se); + trace_ucc_sched_switch(preempt, vinfo); + } + send_cnt = 0; +submit_kernel: + curr_se = cu->curr_se; + dynamic_padding(cu, curr_se); + raw_spin_unlock(&cu->xcu_lock); + + curr_se->is_timeout = 0; + kernel_num = ucc_submit_kernel(cu, curr_se); + //has no more kernels to submit. + if (kernel_num <= 0 && !vstream_have_kernel(curr_se)) { + raw_spin_lock(&curr_se->se_lock); + curr_se->state = SE_BLOCK; + dequeue_ucc_se(curr_se, cu); + raw_spin_unlock(&curr_se->se_lock); + cu->is_sched = 1; + continue; + } + cu->is_sched = 1; + + vinfo = container_of(curr_se, struct vstream_info, se); + if (vinfo->send_cnt > TS_SQ_TRANS_TASK_THRESHOLD) { + update_stats_run_start(cu, curr_se); + /* kernel has not finish */ + if (!cu->is_wake) + ucc_wait_running(cu, curr_se); + + update_stats_run_end(cu, curr_se); + cu->is_wake = 0; + vinfo->send_cnt = 0; + } + + send_cnt += kernel_num; + schedstat_add(se->statistics.kernel_sum, kernel_num); + } + + return 0; +} + +static void init_xcu_rq(struct xcu *cu) +{ + int i; + + for (i = 0; i < MAX_UCC_PRIO; i++) { + INIT_LIST_HEAD(cu->queue + i); + __clear_bit(i, cu->bitmap); + } + + /* delimiter for bitsearch: */ + __set_bit(MAX_UCC_PRIO, cu->bitmap); + cu->rt_nr_running = 0; + raw_spin_lock_init(&cu->xcu_lock); +} + +static int alloc_cu_id(void) +{ + int cu_id = -1; + + raw_spin_lock(&xcu_mgr_lock); + if (num_active_xcu >= MAX_XCU_NUM) { + raw_spin_unlock(&xcu_mgr_lock); + return cu_id; + } + + cu_id = num_active_xcu; + num_active_xcu++; + raw_spin_unlock(&xcu_mgr_lock); + + return cu_id; +} + +int ucc_sched_register_xcu(int dev_id, int ts_id, int cu_num) +{ + int cu_id; + struct xcu *cu; + struct sched_args *args; + struct sched_param param = { .sched_priority = 1 }; + char id_buf[16]; + int i; + + for (i = 0; i < cu_num; i++) { + cu_id = alloc_cu_id(); + if (cu_id < 0) { + pr_err("alloc cu id failed\n"); + return -1; + } + + cu = &xcu_manager[cu_id]; + cu->cu_id = cu_id; + cu->state = XCU_IDLE; + cu->curr_se = NULL; + cu->dev_id = dev_id; + cu->ts_id = ts_id; + cu->is_wake = 0; + init_xcu_rq(cu); + + args = kzalloc(sizeof(struct sched_args), GFP_KERNEL); + if (!args) + return -1; + + args->cu_id = cu->cu_id; + snprintf(id_buf, sizeof(id_buf), "%d:%d:%d", + cu->cu_id, cu->dev_id, cu->ts_id); + cu->worker = kthread_create_on_node(__ucc_schedule, + (void *)args, NUMA_NO_NODE, + "u_sched/%s", id_buf); + sched_setscheduler_nocheck(cu->worker, SCHED_FIFO, &param); + wake_up_process(cu->worker); + } + + return 0; +} +EXPORT_SYMBOL(ucc_sched_register_xcu); + +int ucc_sched_init(void) +{ + raw_spin_lock_init(&xcu_mgr_lock); + return 0; +} + +int ucc_rt_nr_running(struct xcu *cu) +{ + return cu->rt_nr_running; +} +EXPORT_SYMBOL(ucc_rt_nr_running); + +struct xcu *ucc_get_xcu_by_id(int cu_id) +{ + return &xcu_manager[cu_id]; +} +EXPORT_SYMBOL(ucc_get_xcu_by_id); + +int ucc_xcu_is_sched(int cu_id) +{ + return xcu_manager[cu_id].is_sched; +} +EXPORT_SYMBOL(ucc_xcu_is_sched); diff --git a/kernel/ucc_sched/ucc_sched.h b/kernel/ucc_sched/ucc_sched.h new file mode 100644 index 000000000000..30e2aa10cf2f --- /dev/null +++ b/kernel/ucc_sched/ucc_sched.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) Huawei Technologies Co., Ltd. 2023. All rights reserved. + * Author: Huawei OS Kernel Lab + * Create: Tue Jan 17 22:27:22 2023 + */ +#ifndef __UCC_SCHED_USCHED_H__ +#define __UCC_SCHED_USCHED_H__ + +#include <linux/sched.h> +#include <linux/spinlock_types.h> +#include <linux/types.h> +#include <linux/vstream.h> + +//For simplicity, we set this parameter to 2. +#define MAX_UCC_PRIO (2) + +enum xcu_state { + XCU_INACTIVE, + XCU_IDLE, + XCU_BUSY, + XCU_SUBMIT, +}; + +/* + * This is the abstraction object of the xpu computing unit. + */ +struct xcu { + int is_sched; + int cu_id; + int dev_id; + int ts_id; + int rt_nr_running; + int is_wake; + struct task_struct *worker; + DECLARE_BITMAP(bitmap, MAX_UCC_PRIO); + struct list_head queue[MAX_UCC_PRIO]; + enum xcu_state state; + struct ucc_se *curr_se; + raw_spinlock_t xcu_lock; +}; + +#endif -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS-SP1] netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
by Guo Mengqi 13 Sep '23

13 Sep '23

From: Pablo Neira Ayuso <pablo(a)netfilter.org> mainline inclusion from mainline-v6.5-rc4 commit 0a771f7b266b02d262900c75f1e175c7fe76fec2 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I7YIXO CVE: CVE-2022-40982 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --------------------------- On error when building the rule, the immediate expression unbinds the chain, hence objects can be deactivated by the transaction records. Otherwise, it is possible to trigger the following warning: WARNING: CPU: 3 PID: 915 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] CPU: 3 PID: 915 Comm: chain-bind-err- Not tainted 6.1.39 #1 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] Fixes: 4bedf9eee016 ("netfilter: nf_tables: fix chain binding transaction logic") Reported-by: Kevin Rich <kevinrich1337(a)gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Signed-off-by: Florian Westphal <fw(a)strlen.de> conflict: net/netfilter/nft_immediate.c Signed-off-by: Lu Wei <luwei32(a)huawei.com> --- net/netfilter/nft_immediate.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c index 6b0efab4fad0..6bf1c852e8ea 100644 --- a/net/netfilter/nft_immediate.c +++ b/net/netfilter/nft_immediate.c @@ -125,15 +125,27 @@ static void nft_immediate_activate(const struct nft_ctx *ctx, return nft_data_hold(&priv->data, nft_dreg_to_type(priv->dreg)); } +static void nft_immediate_chain_deactivate(const struct nft_ctx *ctx, + struct nft_chain *chain, + enum nft_trans_phase phase) +{ + struct nft_ctx chain_ctx; + struct nft_rule *rule; + + chain_ctx = *ctx; + chain_ctx.chain = chain; + + list_for_each_entry(rule, &chain->rules, list) + nft_rule_expr_deactivate(&chain_ctx, rule, phase); +} + static void nft_immediate_deactivate(const struct nft_ctx *ctx, const struct nft_expr *expr, enum nft_trans_phase phase) { const struct nft_immediate_expr *priv = nft_expr_priv(expr); const struct nft_data *data = &priv->data; - struct nft_ctx chain_ctx; struct nft_chain *chain; - struct nft_rule *rule; if (priv->dreg == NFT_REG_VERDICT) { switch (data->verdict.code) { @@ -143,20 +155,17 @@ static void nft_immediate_deactivate(const struct nft_ctx *ctx, if (!nft_chain_binding(chain)) break; - chain_ctx = *ctx; - chain_ctx.chain = chain; - - list_for_each_entry(rule, &chain->rules, list) - nft_rule_expr_deactivate(&chain_ctx, rule, phase); - switch (phase) { case NFT_TRANS_PREPARE_ERROR: nf_tables_unbind_chain(ctx, chain); - fallthrough; + nft_deactivate_next(ctx->net, chain); + break; case NFT_TRANS_PREPARE: + nft_immediate_chain_deactivate(ctx, chain, phase); nft_deactivate_next(ctx->net, chain); break; default: + nft_immediate_chain_deactivate(ctx, chain, phase); nft_chain_del(chain); chain->bound = false; chain->table->use--; -- 2.17.1

2 1

[PATCH openEuler-23.09 0/5] Add cleanup and fixes for sharepool
by Wang Wensheng 13 Sep '23

13 Sep '23

Wang Wensheng (5): mm/mmap: Don't merge vma from sharepool mm/sharepool: Use mmap_write_[un]lock helper mm/sharepool: Return -ENOMEM when allocate hugepage failed mm/sharepool: Protect the va reserved for sharepool mm/sharepool: Mmap for the current process at first include/linux/share_pool.h | 31 +++++++++++++-------- mm/mmap.c | 17 +++++++++--- mm/mremap.c | 4 +++ mm/share_pool.c | 56 ++++++++++++++++++++++++-------------- 4 files changed, 73 insertions(+), 35 deletions(-) -- 2.17.1

2 6

[PATCH openEuler-1.0-LTS v2] io_uring: ensure IOPOLL locks around deferred work
by Zhihao Cheng 13 Sep '23

13 Sep '23

From: Jens Axboe <axboe(a)kernel.dk> stable inclusion from stable-v5.10.188 commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7KXLN CVE: CVE-2023-21400 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- No direct upstream commit exists for this issue. It was fixed in 5.18 as part of a larger rework of the completion side. io_commit_cqring() writes the CQ ring tail to make it visible, but it also kicks off any deferred work we have. A ring setup with IOPOLL does not need any locking around the CQ ring updates, as we're always under the ctx uring_lock. But if we have deferred work that needs processing, then io_queue_deferred() assumes that the completion_lock is held, as it is for !IOPOLL. Add a lockdep assertion to check and document this fact, and have io_iopoll_complete() check if we have deferred work and run that separately with the appropriate lock grabbed. Cc: stable(a)vger.kernel.org # 5.10, 5.15 Reported-by: dghost david <daviduniverse18(a)gmail.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Lin Yujun <linyujun809(a)huawei.com> Conflicts: fs/io_uring.c Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> --- v1->v2: Add completion_lock for whole io_commit_cqring in iopoll completion fs/io_uring.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index ce60df5e4d91..88eca93c55b7 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1310,6 +1310,8 @@ static void io_kill_timeouts(struct io_ring_ctx *ctx) static void __io_queue_deferred(struct io_ring_ctx *ctx) { + lockdep_assert_held(&ctx->completion_lock); + do { struct io_defer_entry *de = list_first_entry(&ctx->defer_list, struct io_defer_entry, list); @@ -2154,6 +2156,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, struct req_batch rb; struct io_kiocb *req; LIST_HEAD(again); + unsigned long flags; /* order with ->result store in io_complete_rw_iopoll() */ smp_rmb(); @@ -2181,7 +2184,10 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, io_req_free_batch(&rb, req); } + spin_lock_irqsave(&ctx->completion_lock, flags); io_commit_cqring(ctx); + spin_unlock_irqrestore(&ctx->completion_lock, flags); + if (ctx->flags & IORING_SETUP_SQPOLL) io_cqring_ev_posted(ctx); io_req_free_batch_finish(ctx, &rb); -- 2.31.1

2 1

[PATCH openEuler-22.03-LTS-SP1] net/sched: sch_hfsc: Ensure inner classes have fsc curve
by Zhengchao Shao 13 Sep '23

13 Sep '23

From: Budimir Markovic <markovicbudimir(a)gmail.com> mainline inclusion from mainline-v6.5-rc7 commit b3d26c5702c7d6c45456326e56d2ccf3f103e60f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I7Z7CD CVE: CVE-2023-4623 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- HFSC assumes that inner classes have an fsc curve, but it is currently possible for classes without an fsc curve to become parents. This leads to bugs including a use-after-free. Don't allow non-root classes without HFSC_FSC to become parents. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Budimir Markovic <markovicbudimir(a)gmail.com> Signed-off-by: Budimir Markovic <markovicbudimir(a)gmail.com> Acked-by: Jamal Hadi Salim <jhs(a)mojatatu.com> Link: https://lore.kernel.org/r/20230824084905.422-1-markovicbudimir@gmail.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/sched/sch_hfsc.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c index cdc43a06aa9b..6076294a632c 100644 --- a/net/sched/sch_hfsc.c +++ b/net/sched/sch_hfsc.c @@ -1012,6 +1012,10 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid, if (parent == NULL) return -ENOENT; } + if (!(parent->cl_flags & HFSC_FSC) && parent != &q->root) { + NL_SET_ERR_MSG(extack, "Invalid parent - parent class must have FSC"); + return -EINVAL; + } if (classid == 0 || TC_H_MAJ(classid ^ sch->handle) != 0) return -EINVAL; -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS-SP1] netfilter: nf_tables: skip bound chain on rule flush
by Zhengchao Shao 13 Sep '23

13 Sep '23

From: Pablo Neira Ayuso <pablo(a)netfilter.org> stable inclusion from stable-v5.10.188 commit 30e5460d69e631c0e84db37dba2d8f98648778d4 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I7YIXI CVE: CVE-2023-3777 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 6eaf41e87a223ae6f8e7a28d6e78384ad7e407f8 ] Skip bound chain when flushing table rules, the rule that owns this chain releases these objects. Otherwise, the following warning is triggered: WARNING: CPU: 2 PID: 1217 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] CPU: 2 PID: 1217 Comm: chain-flush Not tainted 6.1.39 #1 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Kevin Rich <kevinrich1337(a)gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nf_tables_api.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 3b2275b151a2..bbe6e7023683 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3516,6 +3516,8 @@ static int nf_tables_delrule(struct net *net, struct sock *nlsk, list_for_each_entry(chain, &table->chains, list) { if (!nft_is_active_next(net, chain)) continue; + if (nft_chain_is_bound(chain)) + continue; ctx.chain = chain; err = nft_delrule_by_chain(&ctx); -- 2.34.1

2 1