Alex Shi (1): mm: add VM_WARN_ON_ONCE_PAGE() macro
Alper Gun (1): KVM: SVM: Call SEV Guest Decommission if ASID binding fails
Anson Huang (1): ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment
Christian König (1): drm/nouveau: fix dma_address check for CPU/GPU sync
Greg Kroah-Hartman (1): Linux 4.19.197
Hugh Dickins (16): mm/thp: fix __split_huge_pmd_locked() on shmem migration entry mm/thp: make is_huge_zero_pmd() safe and quicker mm/thp: try_to_unmap() use TTU_SYNC for safe splitting mm/thp: fix vma_address() if virtual address below file offset mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() mm: page_vma_mapped_walk(): use page for pvmw->page mm: page_vma_mapped_walk(): settle PageHuge on entry mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block mm: page_vma_mapped_walk(): crossing page table boundary mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): get vma_address_end() earlier mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() mm, futex: fix shared futex pgoff on shmem huge page
Jue Wang (1): mm/thp: fix page_address_in_vma() on file THP tails
Juergen Gross (1): xen/events: reset active flag for lateeoi events later
ManYi Li (1): scsi: sr: Return appropriate error code when disk is ejected
Miaohe Lin (2): mm/rmap: remove unneeded semicolon in page_not_mapped() mm/rmap: use page_not_mapped in try_to_unmap()
Petr Mladek (2): kthread_worker: split code for canceling the delayed work timer kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Tony Lindgren (3): clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
Yang Shi (1): mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
afzal mohammed (1): ARM: OMAP: replace setup_irq() by request_irq()
Makefile | 2 +- arch/arm/boot/dts/dra7.dtsi | 11 ++ arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 4 - arch/arm/mach-omap1/pm.c | 13 +- arch/arm/mach-omap1/time.c | 10 +- arch/arm/mach-omap1/timer32k.c | 10 +- arch/arm/mach-omap2/board-generic.c | 4 +- arch/arm/mach-omap2/timer.c | 181 +++++++++++++++++-------- arch/x86/kvm/svm.c | 32 +++-- drivers/clk/ti/clk-7xx.c | 1 + drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- drivers/scsi/sr.c | 2 + drivers/xen/events/events_base.c | 23 +++- include/linux/cpuhotplug.h | 1 + include/linux/huge_mm.h | 8 +- include/linux/hugetlb.h | 16 --- include/linux/mm.h | 3 + include/linux/mmdebug.h | 13 ++ include/linux/pagemap.h | 13 +- include/linux/rmap.h | 3 +- kernel/futex.c | 2 +- kernel/kthread.c | 77 +++++++---- mm/huge_memory.c | 56 ++++---- mm/hugetlb.c | 5 +- mm/internal.h | 53 ++++++-- mm/memory.c | 41 ++++++ mm/page_vma_mapped.c | 160 +++++++++++++--------- mm/pgtable-generic.c | 4 +- mm/rmap.c | 48 ++++--- mm/truncate.c | 43 +++--- 30 files changed, 541 insertions(+), 302 deletions(-)
From: Alex Shi alex.shi@linux.alibaba.com
stable inclusion from linux-4.19.197 commit 131cb7a08c767e2d082ff5de6d384d8d935fb6f1
--------------------------------
[ Upstream commit a4055888629bc0467d12d912cd7c90acdf3d9b12 part ]
Add VM_WARN_ON_ONCE_PAGE() macro.
Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.a... Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: original commit was titled mm/memcg: warning on !memcg after readahead page charged which included uses of this macro in mm/memcontrol.c: here omitted.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/mmdebug.h | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index 2ad72d2c8cc52..5d0767cb424aa 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -37,6 +37,18 @@ void dump_mm(const struct mm_struct *mm); BUG(); \ } \ } while (0) +#define VM_WARN_ON_ONCE_PAGE(cond, page) ({ \ + static bool __section(".data.once") __warned; \ + int __ret_warn_once = !!(cond); \ + \ + if (unlikely(__ret_warn_once && !__warned)) { \ + dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\ + __warned = true; \ + WARN_ON(1); \ + } \ + unlikely(__ret_warn_once); \ +}) + #define VM_WARN_ON(cond) (void)WARN_ON(cond) #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond) #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format) @@ -48,6 +60,7 @@ void dump_mm(const struct mm_struct *mm); #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond) #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond) +#define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond) #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond) #endif
From: Miaohe Lin linmiaohe@huawei.com
stable inclusion from linux-4.19.197 commit 784445344c9e7cdef2a2a724fab34243933a91dc
--------------------------------
[ Upstream commit e0af87ff7afcde2660be44302836d2d5618185af ]
Remove extra semicolon without any functional change intended.
Link: https://lkml.kernel.org/r/20210127093425.39640-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/rmap.c b/mm/rmap.c index aabd094d310f1..924e14bc8da8c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1740,7 +1740,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) static int page_not_mapped(struct page *page) { return !page_mapped(page); -}; +}
/** * try_to_munlock - try to munlock a page
From: Miaohe Lin linmiaohe@huawei.com
stable inclusion from linux-4.19.197 commit 420afb489a1f6981cbe2fb9ee10007dba878e223
--------------------------------
[ Upstream commit b7e188ec98b1644ff70a6d3624ea16aadc39f5e0 ]
page_mapcount_is_zero() calculates accurately how many mappings a hugepage has in order to check against 0 only. This is a waste of cpu time. We can do this via page_not_mapped() to save some possible atomic_read cycles. Remove the function page_mapcount_is_zero() as it's not used anymore and move page_not_mapped() above try_to_unmap() to avoid identifier undeclared compilation error.
Link: https://lkml.kernel.org/r/20210130084904.35307-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/rmap.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c index 924e14bc8da8c..ed6ad441a50c2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1693,9 +1693,9 @@ static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg) return is_vma_temporary_stack(vma); }
-static int page_mapcount_is_zero(struct page *page) +static int page_not_mapped(struct page *page) { - return !total_mapcount(page); + return !page_mapped(page); }
/** @@ -1713,7 +1713,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) struct rmap_walk_control rwc = { .rmap_one = try_to_unmap_one, .arg = (void *)flags, - .done = page_mapcount_is_zero, + .done = page_not_mapped, .anon_lock = page_lock_anon_vma_read, };
@@ -1737,11 +1737,6 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) return !page_mapcount(page) ? true : false; }
-static int page_not_mapped(struct page *page) -{ - return !page_mapped(page); -} - /** * try_to_munlock - try to munlock a page * @page: the page to be munlocked
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit 629ee482e0966c23ddc1abeebf71aa84ff8d3673
--------------------------------
[ Upstream commit 99fa8a48203d62b3743d866fc48ef6abaee682be ]
Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.
Here is v2 batch of long-standing THP bug fixes that I had not got around to sending before, but prompted now by Wang Yugui's report https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and they have done no harm, but have *not* fixed that issue: something more is needed and I have no idea of what.
This patch (of 7):
Stressing huge tmpfs page migration racing hole punch often crashed on the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y kernel; or shortly afterwards, on a bad dereference in __split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd migration entries in the non-anonymous case.
Full disclosure: those particular experiments were on a kernel with more relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the vanilla kernel: it is conceivable that stricter locking happens to avoid those cases, or makes them less likely; but __split_huge_pmd_locked() already allowed for pmd migration entries when handling anonymous THPs, so this commit brings the shmem and file THP handling into line.
And while there: use old_pmd rather than _pmd, as in the following blocks; and make it clearer to the eye that the !vma_is_anonymous() block is self-contained, making an early return after accounting for unmapping.
Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins hughd@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Yang Shi shy828301@gmail.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Alistair Popple apopple@nvidia.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Zi Yan ziy@nvidia.com Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Jue Wang juew@google.com Cc: Peter Xu peterx@redhat.com Cc: Jan Kara jack@suse.cz Cc: Shakeel Butt shakeelb@google.com Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: this commit made intervening cleanups in pmdp_huge_clear_flush() redundant: here it's rediffed to skip them.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/huge_memory.c | 27 ++++++++++++++++++--------- mm/pgtable-generic.c | 4 ++-- 2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 31628c65642a6..0c70d06d97a4c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2153,7 +2153,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, count_vm_event(THP_SPLIT_PMD);
if (!vma_is_anonymous(vma)) { - _pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); + old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); /* * We are going to unmap this huge page. So * just go ahead and zap it @@ -2162,16 +2162,25 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, zap_deposited_table(mm, pmd); if (vma_is_dax(vma)) return; - page = pmd_page(_pmd); - if (!PageDirty(page) && pmd_dirty(_pmd)) - set_page_dirty(page); - if (!PageReferenced(page) && pmd_young(_pmd)) - SetPageReferenced(page); - page_remove_rmap(page, true); - put_page(page); + if (unlikely(is_pmd_migration_entry(old_pmd))) { + swp_entry_t entry; + + entry = pmd_to_swp_entry(old_pmd); + page = migration_entry_to_page(entry); + } else { + page = pmd_page(old_pmd); + if (!PageDirty(page) && pmd_dirty(old_pmd)) + set_page_dirty(page); + if (!PageReferenced(page) && pmd_young(old_pmd)) + SetPageReferenced(page); + page_remove_rmap(page, true); + put_page(page); + } add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { + } + + if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index cf2af04b34b97..36770fcdc3582 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -125,8 +125,8 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, { pmd_t pmd; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - VM_BUG_ON((pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && - !pmd_devmap(*pmdp)) || !pmd_present(*pmdp)); + VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && + !pmd_devmap(*pmdp)); pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd;
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit fc1fbc5b017b6f5ef24a4a93f33cd022225e01c8
--------------------------------
[ Upstream commit 3b77e8c8cde581dadab9a0f1543a347e24315f11 ]
Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry.
Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time.
__split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present.
Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/huge_mm.h | 8 +++++++- mm/huge_memory.c | 5 ++++- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7a447e8e84811..00f474e1c64b3 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -224,6 +224,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, extern vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd);
extern struct page *huge_zero_page; +extern unsigned long huge_zero_pfn;
static inline bool is_huge_zero_page(struct page *page) { @@ -232,7 +233,7 @@ static inline bool is_huge_zero_page(struct page *page)
static inline bool is_huge_zero_pmd(pmd_t pmd) { - return is_huge_zero_page(pmd_page(pmd)); + return READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd) && pmd_present(pmd); }
static inline bool is_huge_zero_pud(pud_t pud) @@ -351,6 +352,11 @@ static inline bool is_huge_zero_page(struct page *page) return false; }
+static inline bool is_huge_zero_pmd(pmd_t pmd) +{ + return false; +} + static inline bool is_huge_zero_pud(pud_t pud) { return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0c70d06d97a4c..073e5f95e3b8f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -62,6 +62,7 @@ static struct shrinker deferred_split_shrinker;
static atomic_t huge_zero_refcount; struct page *huge_zero_page __read_mostly; +unsigned long huge_zero_pfn __read_mostly = ~0UL;
bool transparent_hugepage_enabled(struct vm_area_struct *vma) { @@ -93,6 +94,7 @@ static struct page *get_huge_zero_page(void) __free_pages(zero_page, compound_order(zero_page)); goto retry; } + WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
/* We take additional reference here. It will be put back by shrinker */ atomic_set(&huge_zero_refcount, 2); @@ -142,6 +144,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, if (atomic_cmpxchg(&huge_zero_refcount, 1, 0) == 1) { struct page *zero_page = xchg(&huge_zero_page, NULL); BUG_ON(zero_page == NULL); + WRITE_ONCE(huge_zero_pfn, ~0UL); __free_pages(zero_page, compound_order(zero_page)); return HPAGE_PMD_NR; } @@ -2180,7 +2183,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, return; }
- if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { + if (is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit 205899d6be9a1d5494318920a211852a32886e46
--------------------------------
[ Upstream commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 ]
Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0.
And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently.
I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount.
Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page().
When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated.
Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins hughd@google.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: upstream TTU_SYNC 0x10 takes the value which 5.11 commit 013339df116c ("mm/rmap: always do TTU_IGNORE_ACCESS") freed. It is very tempting to backport that commit (as 5.10 already did) and make no change here; but on reflection, good as that commit is, I'm reluctant to include any possible side-effect of it in this series.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/rmap.h | 3 ++- mm/huge_memory.c | 2 +- mm/page_vma_mapped.c | 11 +++++++++++ mm/rmap.c | 17 ++++++++++++++++- 4 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index d7d6d4eb17949..91ccae9467164 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -98,7 +98,8 @@ enum ttu_flags { * do a final flush if necessary */ TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: * caller holds it */ - TTU_SPLIT_FREEZE = 0x100, /* freeze pte under splitting thp */ + TTU_SPLIT_FREEZE = 0x100, /* freeze pte under splitting thp */ + TTU_SYNC = 0x200, /* avoid racy checks with PVMW_SYNC */ };
#ifdef CONFIG_MMU diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 073e5f95e3b8f..2489caa8a0c29 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2458,7 +2458,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, static void unmap_page(struct page *page) { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | - TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; + TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC; bool unmap_success;
VM_BUG_ON_PAGE(!PageHead(page), page); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 11df03e71288c..08e283ad46606 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -208,6 +208,17 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pvmw->ptl = NULL; } } else if (!pmd_present(pmde)) { + /* + * If PVMW_SYNC, take and drop THP pmd lock so that we + * cannot return prematurely, while zap_huge_pmd() has + * cleared *pmd but not decremented compound_mapcount(). + */ + if ((pvmw->flags & PVMW_SYNC) && + PageTransCompound(pvmw->page)) { + spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + + spin_unlock(ptl); + } return false; } if (!map_pte(pvmw)) diff --git a/mm/rmap.c b/mm/rmap.c index ed6ad441a50c2..4f82fc2fbc9d1 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1354,6 +1354,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, unsigned long start = address, end; enum ttu_flags flags = (enum ttu_flags)arg;
+ /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + if (flags & TTU_SYNC) + pvmw.flags = PVMW_SYNC; + /* munlock has nothing to gain from examining un-locked vmas */ if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED)) return true; @@ -1734,7 +1743,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) else rmap_walk(page, &rwc);
- return !page_mapcount(page) ? true : false; + /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + return !page_mapcount(page); }
/**
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit f38722557339bd22a5f1e8bae013f81b69309e9a
--------------------------------
[ Upstream commit 494334e43c16d63b878536a26505397fce6ff3a2 ]
Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
When that BUG() was changed to a WARN(), it would later crash on the VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap().
vma_address() is usually correct, but there's a wraparound case when the vm_start address is unusually low, but vm_pgoff not so low: vma_address() chooses max(start, vma->vm_start), but that decides on the wrong address, because start has become almost ULONG_MAX.
Rewrite vma_address() to be more careful about vm_pgoff; move the VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be safely used from page_mapped_in_vma() and page_address_in_vma() too.
Add vma_address_end() to apply similar care to end address calculation, in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one(); though it raises a question of whether callers would do better to supply pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.
An irritation is that their apparent generality breaks down on KSM pages, which cannot be located by the page->index that page_to_pgoff() uses: as commit 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte() for ksm pages") once discovered. I dithered over the best thing to do about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and vma_address_end(); though the only place in danger of using it on them was try_to_unmap_one().
Sidenote: vma_address() and vma_address_end() now use compound_nr() on a head page, instead of thp_size(): to make the right calculation on a hugetlbfs page, whether or not THPs are configured. try_to_unmap() is used on hugetlbfs pages, but perhaps the wrong calculation never mattered.
Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: fixed up conflicts on intervening thp_size(), and mmu_notifier_range initializations; substitute for compound_nr().
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/internal.h | 53 ++++++++++++++++++++++++++++++++------------ mm/page_vma_mapped.c | 16 +++++-------- mm/rmap.c | 14 ++++++------ 3 files changed, 52 insertions(+), 31 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index 91ffb1dced61d..bfa97c43a3fc5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -331,27 +331,52 @@ static inline void mlock_migrate_page(struct page *newpage, struct page *page) extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
/* - * At what user virtual address is page expected in @vma? + * At what user virtual address is page expected in vma? + * Returns -EFAULT if all of the page is outside the range of vma. + * If page is a compound head, the entire compound page is considered. */ static inline unsigned long -__vma_address(struct page *page, struct vm_area_struct *vma) +vma_address(struct page *page, struct vm_area_struct *vma) { - pgoff_t pgoff = page_to_pgoff(page); - return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + pgoff_t pgoff; + unsigned long address; + + VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */ + pgoff = page_to_pgoff(page); + if (pgoff >= vma->vm_pgoff) { + address = vma->vm_start + + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + /* Check for address beyond vma (or wrapped through 0?) */ + if (address < vma->vm_start || address >= vma->vm_end) + address = -EFAULT; + } else if (PageHead(page) && + pgoff + (1UL << compound_order(page)) - 1 >= vma->vm_pgoff) { + /* Test above avoids possibility of wrap to 0 on 32-bit */ + address = vma->vm_start; + } else { + address = -EFAULT; + } + return address; }
+/* + * Then at what user virtual address will none of the page be found in vma? + * Assumes that vma_address() already returned a good starting address. + * If page is a compound head, the entire compound page is considered. + */ static inline unsigned long -vma_address(struct page *page, struct vm_area_struct *vma) +vma_address_end(struct page *page, struct vm_area_struct *vma) { - unsigned long start, end; - - start = __vma_address(page, vma); - end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1); - - /* page should be within @vma mapping range */ - VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma); - - return max(start, vma->vm_start); + pgoff_t pgoff; + unsigned long address; + + VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */ + pgoff = page_to_pgoff(page) + (1UL << compound_order(page)); + address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + /* Check for address beyond vma (or wrapped through 0?) */ + if (address < vma->vm_start || address > vma->vm_end) + address = vma->vm_end; + return address; }
static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf, diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 08e283ad46606..bb7488e86bea3 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -224,18 +224,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (!map_pte(pvmw)) goto next_pte; while (1) { + unsigned long end; + if (check_pte(pvmw)) return true; next_pte: /* Seek to next pte only makes sense for THP */ if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page)) return not_found(pvmw); + end = vma_address_end(pvmw->page, pvmw->vma); do { pvmw->address += PAGE_SIZE; - if (pvmw->address >= pvmw->vma->vm_end || - pvmw->address >= - __vma_address(pvmw->page, pvmw->vma) + - hpage_nr_pages(pvmw->page) * PAGE_SIZE) + if (pvmw->address >= end) return not_found(pvmw); /* Did we cross page table boundary? */ if (pvmw->address % PMD_SIZE == 0) { @@ -273,14 +273,10 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma) .vma = vma, .flags = PVMW_SYNC, }; - unsigned long start, end; - - start = __vma_address(page, vma); - end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
- if (unlikely(end < vma->vm_start || start >= vma->vm_end)) + pvmw.address = vma_address(page, vma); + if (pvmw.address == -EFAULT) return 0; - pvmw.address = max(start, vma->vm_start); if (!page_vma_mapped_walk(&pvmw)) return 0; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/rmap.c b/mm/rmap.c index 4f82fc2fbc9d1..beb191abde219 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -692,7 +692,6 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) */ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) { - unsigned long address; if (PageAnon(page)) { struct anon_vma *page__anon_vma = page_anon_vma(page); /* @@ -707,10 +706,8 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return -EFAULT; } else return -EFAULT; - address = __vma_address(page, vma); - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) - return -EFAULT; - return address; + + return vma_address(page, vma); }
pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) @@ -902,7 +899,7 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma, * We have to assume the worse case ie pmd for invalidation. Note that * the page can not be free from this function. */ - end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page))); + end = vma_address_end(page, vma); mmu_notifier_invalidate_range_start(vma->vm_mm, start, end);
while (page_vma_mapped_walk(&pvmw)) { @@ -1384,7 +1381,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * Note that the page can not be free in this function as call of * try_to_unmap() must hold a reference on the page. */ - end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page))); + end = PageKsm(page) ? + address + PAGE_SIZE : vma_address_end(page, vma); if (PageHuge(page)) { /* * If sharing is possible, start and end will be adjusted @@ -1846,6 +1844,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, struct vm_area_struct *vma = avc->vma; unsigned long address = vma_address(page, vma);
+ VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) @@ -1900,6 +1899,7 @@ static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, pgoff_start, pgoff_end) { unsigned long address = vma_address(page, vma);
+ VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
From: Jue Wang juew@google.com
stable inclusion from linux-4.19.197 commit 2c595b67fceda4a345ac4dcbd0caa2049fd17cb7
--------------------------------
[ Upstream commit 31657170deaf1d8d2f6a1955fbc6fa9d228be036 ]
Anon THP tails were already supported, but memory-failure may need to use page_address_in_vma() on file THP tails, which its page->mapping check did not permit: fix it.
hughd adds: no current usage is known to hit the issue, but this does fix a subtle trap in a general helper: best fixed in stable sooner than later.
Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Jue Wang juew@google.com Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Reviewed-by: Yang Shi shy828301@gmail.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/rmap.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c index beb191abde219..738e07ee35345 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -701,11 +701,11 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) if (!vma->anon_vma || !page__anon_vma || vma->anon_vma->root != page__anon_vma->root) return -EFAULT; - } else if (page->mapping) { - if (!vma->vm_file || vma->vm_file->f_mapping != page->mapping) - return -EFAULT; - } else + } else if (!vma->vm_file) { + return -EFAULT; + } else if (vma->vm_file->f_mapping != compound_head(page)->mapping) { return -EFAULT; + }
return vma_address(page, vma); }
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit d5cd96a7880322692d64fbe75d321ccd39392537
--------------------------------
[ Upstream commit 22061a1ffabdb9c3385de159c5db7aac3a4df1cc ]
There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped.
The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page.
However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside.
The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside.
Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly.
This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe.
Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped.
Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: fc127da085c2 ("truncate: handle file thp") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: fixed up call to truncate_cleanup_page() in truncate_inode_pages_range(). Use hpage_nr_pages() in unmap_mapping_page().
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/mm.h | 3 +++ mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++ mm/truncate.c | 43 +++++++++++++++++++------------------------ 3 files changed, 63 insertions(+), 24 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 0e02b6fdf4da5..4ee2f01d9ea52 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1425,6 +1425,7 @@ struct zap_details { struct address_space *check_mapping; /* Check page->mapping if set */ pgoff_t first_index; /* Lowest page->index to unmap */ pgoff_t last_index; /* Highest page->index to unmap */ + struct page *single_page; /* Locked page to be unmapped */ };
struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, @@ -1515,6 +1516,7 @@ extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); +void unmap_mapping_page(struct page *page); void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows); void unmap_mapping_range(struct address_space *mapping, @@ -1535,6 +1537,7 @@ static inline int fixup_user_fault(struct task_struct *tsk, BUG(); return -EFAULT; } +static inline void unmap_mapping_page(struct page *page) { } static inline void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows) { } static inline void unmap_mapping_range(struct address_space *mapping, diff --git a/mm/memory.c b/mm/memory.c index a2b34d20a1591..05987251c1c75 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1194,7 +1194,18 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, else if (zap_huge_pmd(tlb, vma, pmd, addr)) goto next; /* fall through */ + } else if (details && details->single_page && + PageTransCompound(details->single_page) && + next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) { + spinlock_t *ptl = pmd_lock(tlb->mm, pmd); + /* + * Take and drop THP pmd lock so that we cannot return + * prematurely, while zap_huge_pmd() has cleared *pmd, + * but not yet decremented compound_mapcount(). + */ + spin_unlock(ptl); } + /* * Here there can be other concurrent MADV_DONTNEED or * trans huge page faults running, and if the pmd is @@ -2768,6 +2779,36 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root, } }
+/** + * unmap_mapping_page() - Unmap single page from processes. + * @page: The locked page to be unmapped. + * + * Unmap this page from any userspace process which still has it mmaped. + * Typically, for efficiency, the range of nearby pages has already been + * unmapped by unmap_mapping_pages() or unmap_mapping_range(). But once + * truncation or invalidation holds the lock on a page, it may find that + * the page has been remapped again: and then uses unmap_mapping_page() + * to unmap it finally. + */ +void unmap_mapping_page(struct page *page) +{ + struct address_space *mapping = page->mapping; + struct zap_details details = { }; + + VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON(PageTail(page)); + + details.check_mapping = mapping; + details.first_index = page->index; + details.last_index = page->index + hpage_nr_pages(page) - 1; + details.single_page = page; + + i_mmap_lock_write(mapping); + if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) + unmap_mapping_range_tree(&mapping->i_mmap, &details); + i_mmap_unlock_write(mapping); +} + /** * unmap_mapping_pages() - Unmap pages from processes. * @mapping: The address space containing pages to be unmapped. diff --git a/mm/truncate.c b/mm/truncate.c index 71b65aab80775..43c73db17a0a6 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -175,13 +175,10 @@ void do_invalidatepage(struct page *page, unsigned int offset, * its lock, b) when a concurrent invalidate_mapping_pages got there first and * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space. */ -static void -truncate_cleanup_page(struct address_space *mapping, struct page *page) +static void truncate_cleanup_page(struct page *page) { - if (page_mapped(page)) { - pgoff_t nr = PageTransHuge(page) ? HPAGE_PMD_NR : 1; - unmap_mapping_pages(mapping, page->index, nr, false); - } + if (page_mapped(page)) + unmap_mapping_page(page);
if (page_has_private(page)) do_invalidatepage(page, 0, PAGE_SIZE); @@ -226,7 +223,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page) if (page->mapping != mapping) return -EIO;
- truncate_cleanup_page(mapping, page); + truncate_cleanup_page(page); delete_from_page_cache(page); return 0; } @@ -364,7 +361,7 @@ void truncate_inode_pages_range(struct address_space *mapping, pagevec_add(&locked_pvec, page); } for (i = 0; i < pagevec_count(&locked_pvec); i++) - truncate_cleanup_page(mapping, locked_pvec.pages[i]); + truncate_cleanup_page(locked_pvec.pages[i]); delete_from_page_cache_batch(mapping, &locked_pvec); for (i = 0; i < pagevec_count(&locked_pvec); i++) unlock_page(locked_pvec.pages[i]); @@ -703,6 +700,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping, continue; }
+ if (!did_range_unmap && page_mapped(page)) { + /* + * If page is mapped, before taking its lock, + * zap the rest of the file in one hit. + */ + unmap_mapping_pages(mapping, index, + (1 + end - index), false); + did_range_unmap = 1; + } + lock_page(page); WARN_ON(page_to_index(page) != index); if (page->mapping != mapping) { @@ -710,23 +717,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping, continue; } wait_on_page_writeback(page); - if (page_mapped(page)) { - if (!did_range_unmap) { - /* - * Zap the rest of the file in one hit. - */ - unmap_mapping_pages(mapping, index, - (1 + end - index), false); - did_range_unmap = 1; - } else { - /* - * Just zap this page - */ - unmap_mapping_pages(mapping, index, - 1, false); - } - } + + if (page_mapped(page)) + unmap_mapping_page(page); BUG_ON(page_mapped(page)); + ret2 = do_launder_page(mapping, page); if (ret2 == 0) { if (!invalidate_complete_page2(mapping, page))
From: Yang Shi shy828301@gmail.com
stable inclusion from linux-4.19.197 commit e17afb6d0b7c74f316dbff33588210190600efc7
--------------------------------
[ Upstream commit 504e070dc08f757bccaed6d05c0f53ecbfac8a23 ]
When debugging the bug reported by Wang Yugui [1], try_to_unmap() may fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however it may miss the failure when head page is unmapped but other subpage is mapped. Then the second DEBUG_VM BUG() that check total mapcount would catch it. This may incur some confusion.
As this is not a fatal issue, so consolidate the two DEBUG_VM checks into one VM_WARN_ON_ONCE_PAGE().
[1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com Signed-off-by: Yang Shi shy828301@gmail.com Reviewed-by: Zi Yan ziy@nvidia.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Hugh Dickins hughd@google.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: fixed up variables and split_queue_lock in split_huge_page_to_list(), and conflict on ttu_flags in unmap_page().
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com Conflicts: mm/huge_memory.c [yyl: keep it same as mainline] --- mm/huge_memory.c | 24 +++++++----------------- 1 file changed, 7 insertions(+), 17 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2489caa8a0c29..a1261be638356 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2459,15 +2459,15 @@ static void unmap_page(struct page *page) { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC; - bool unmap_success;
VM_BUG_ON_PAGE(!PageHead(page), page);
if (PageAnon(page)) ttu_flags |= TTU_SPLIT_FREEZE;
- unmap_success = try_to_unmap(page, ttu_flags); - VM_BUG_ON_PAGE(!unmap_success, page); + try_to_unmap(page, ttu_flags); + + VM_WARN_ON_ONCE_PAGE(page_mapped(page), page); }
static void remap_page(struct page *page) @@ -2733,7 +2733,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) struct deferred_split *ds_queue = get_deferred_split_queue(page); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; - int count, mapcount, extra_pins, ret; + int extra_pins, ret; bool mlocked; unsigned long flags; pgoff_t end; @@ -2795,7 +2795,6 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
mlocked = PageMlocked(page); unmap_page(head); - VM_BUG_ON_PAGE(compound_mapcount(head), head);
/* Make sure the page is not on per-CPU pagevec as it takes pin */ if (mlocked) @@ -2821,9 +2820,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
/* Prevent deferred_split_scan() touching ->_refcount */ spin_lock(&ds_queue->split_queue_lock); - count = page_count(head); - mapcount = total_mapcount(head); - if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { + if (page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; list_del(page_deferred_list(head)); @@ -2834,16 +2831,9 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __split_huge_page(page, list, end, flags); ret = 0; } else { - if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { - pr_alert("total_mapcount: %u, page_count(): %u\n", - mapcount, count); - if (PageTail(page)) - dump_page(head, NULL); - dump_page(page, "total_mapcount(head) > 0"); - BUG(); - } spin_unlock(&ds_queue->split_queue_lock); -fail: if (mapping) +fail: + if (mapping) xa_unlock(&mapping->i_pages); spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); remap_page(head);
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit be9ab2d00d071f930cc0389a8ce1424fd84d4c6b
--------------------------------
[ Upstream commit f003c03bd29e6f46fef1b9a8e8d636ac732286d5 ]
Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes".
I've marked all of these for stable: many are merely cleanups, but I think they are much better before the main fix than after.
This patch (of 11):
page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was used, sometimes pvmw->page itself: use the local copy "page" throughout.
Link: https://lkml.kernel.org/r/589b358c-febc-c88e-d4c2-7834b37fa7bf@google.com Link: https://lkml.kernel.org/r/88e67645-f467-c279-bf5e-af4b5c6b13eb@google.com Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Alistair Popple apopple@nvidia.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Yang Shi shy828301@gmail.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Matthew Wilcox willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Cc: Zi Yan ziy@nvidia.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index bb7488e86bea3..d146f6d31a624 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -151,7 +151,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pte) goto next_pte;
- if (unlikely(PageHuge(pvmw->page))) { + if (unlikely(PageHuge(page))) { /* when pud is not present, pte will be NULL */ pvmw->pte = huge_pte_offset(mm, pvmw->address, PAGE_SIZE << compound_order(page)); @@ -213,8 +213,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) * cannot return prematurely, while zap_huge_pmd() has * cleared *pmd but not decremented compound_mapcount(). */ - if ((pvmw->flags & PVMW_SYNC) && - PageTransCompound(pvmw->page)) { + if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) { spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
spin_unlock(ptl); @@ -230,9 +229,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return true; next_pte: /* Seek to next pte only makes sense for THP */ - if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page)) + if (!PageTransHuge(page) || PageHuge(page)) return not_found(pvmw); - end = vma_address_end(pvmw->page, pvmw->vma); + end = vma_address_end(page, pvmw->vma); do { pvmw->address += PAGE_SIZE; if (pvmw->address >= end)
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit a027b699161023c017b816b61be9990e3f9286f4
--------------------------------
[ Upstream commit 6d0fd5987657cb0c9756ce684e3a74c0f6351728 ]
page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of the way at the start, so no need to worry about it later.
Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Matthew Wilcox willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index d146f6d31a624..036ce20bb154c 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -148,10 +148,11 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pmd && !pvmw->pte) return not_found(pvmw);
- if (pvmw->pte) - goto next_pte; - if (unlikely(PageHuge(page))) { + /* The only possible mapping was handled on last iteration */ + if (pvmw->pte) + return not_found(pvmw); + /* when pud is not present, pte will be NULL */ pvmw->pte = huge_pte_offset(mm, pvmw->address, PAGE_SIZE << compound_order(page)); @@ -164,6 +165,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return not_found(pvmw); return true; } + + if (pvmw->pte) + goto next_pte; restart: pgd = pgd_offset(mm, pvmw->address); if (!pgd_present(*pgd)) @@ -229,7 +233,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return true; next_pte: /* Seek to next pte only makes sense for THP */ - if (!PageTransHuge(page) || PageHuge(page)) + if (!PageTransHuge(page)) return not_found(pvmw); end = vma_address_end(page, pvmw->vma); do {
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit 9fbb45c5d59d8e4c81c90cc418bed167503db227
--------------------------------
[ Upstream commit 3306d3119ceacc43ea8b141a73e21fea68eec30c ]
page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then use it in subsequent tests, instead of repeatedly dereferencing pointer.
Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 036ce20bb154c..7e1db77b096a1 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -187,18 +187,19 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pmde = READ_ONCE(*pvmw->pmd); if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); - if (likely(pmd_trans_huge(*pvmw->pmd))) { + pmde = *pvmw->pmd; + if (likely(pmd_trans_huge(pmde))) { if (pvmw->flags & PVMW_MIGRATION) return not_found(pvmw); - if (pmd_page(*pvmw->pmd) != page) + if (pmd_page(pmde) != page) return not_found(pvmw); return true; - } else if (!pmd_present(*pvmw->pmd)) { + } else if (!pmd_present(pmde)) { if (thp_migration_supported()) { if (!(pvmw->flags & PVMW_MIGRATION)) return not_found(pvmw); - if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { - swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); + if (is_migration_entry(pmd_to_swp_entry(pmde))) { + swp_entry_t entry = pmd_to_swp_entry(pmde);
if (migration_entry_to_page(entry) != page) return not_found(pvmw);
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit d4b99cf445d24089255a49ff927ed8884ca2f7f5
--------------------------------
[ Upstream commit e2e1d4076c77b3671cf8ce702535ae7dee3acf89 ]
page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to follow the same "return not_found, return not_found, return true" pattern as the block above it (note: returning not_found there is never premature, since existence or prior existence of huge pmd guarantees good alignment).
Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 7e1db77b096a1..e32ed79129239 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -194,24 +194,22 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pmd_page(pmde) != page) return not_found(pvmw); return true; - } else if (!pmd_present(pmde)) { - if (thp_migration_supported()) { - if (!(pvmw->flags & PVMW_MIGRATION)) - return not_found(pvmw); - if (is_migration_entry(pmd_to_swp_entry(pmde))) { - swp_entry_t entry = pmd_to_swp_entry(pmde); + } + if (!pmd_present(pmde)) { + swp_entry_t entry;
- if (migration_entry_to_page(entry) != page) - return not_found(pvmw); - return true; - } - } - return not_found(pvmw); - } else { - /* THP pmd was split under us: handle on pte level */ - spin_unlock(pvmw->ptl); - pvmw->ptl = NULL; + if (!thp_migration_supported() || + !(pvmw->flags & PVMW_MIGRATION)) + return not_found(pvmw); + entry = pmd_to_swp_entry(pmde); + if (!is_migration_entry(entry) || + migration_entry_to_page(entry) != page) + return not_found(pvmw); + return true; } + /* THP pmd was split under us: handle on pte level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; } else if (!pmd_present(pmde)) { /* * If PVMW_SYNC, take and drop THP pmd lock so that we
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit 97a79b7896d6b8c009561f695649b31c1b35437c
--------------------------------
[ Upstream commit 448282487483d6fa5b2eeeafaa0acc681e544a9c ]
page_vma_mapped_walk() cleanup: adjust the test for crossing page table boundary - I believe pvmw->address is always page-aligned, but nothing else here assumed that; and remember to reset pvmw->pte to NULL after unmapping the page table, though I never saw any bug from that.
Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index e32ed79129239..1fccadd0eb6d9 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -240,16 +240,16 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->address >= end) return not_found(pvmw); /* Did we cross page table boundary? */ - if (pvmw->address % PMD_SIZE == 0) { - pte_unmap(pvmw->pte); + if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) { if (pvmw->ptl) { spin_unlock(pvmw->ptl); pvmw->ptl = NULL; } + pte_unmap(pvmw->pte); + pvmw->pte = NULL; goto restart; - } else { - pvmw->pte++; } + pvmw->pte++; } while (pte_none(*pvmw->pte));
if (!pvmw->ptl) {
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit ac0324b14dae4dd664fd94d9ada161089f4d2b8b
--------------------------------
[ Upstream commit b3807a91aca7d21c05d5790612e49969117a72b9 ]
page_vma_mapped_walk() cleanup: add a level of indentation to much of the body, making no functional change in this commit, but reducing the later diff when this is all converted to a loop.
[hughd@google.com: : page_vma_mapped_walk(): add a level of indentation fix] Link: https://lkml.kernel.org/r/7f817555-3ce1-c785-e438-87d8efdcaf26@google.com
Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 105 ++++++++++++++++++++++--------------------- 1 file changed, 55 insertions(+), 50 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 1fccadd0eb6d9..8199456782640 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -169,62 +169,67 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pte) goto next_pte; restart: - pgd = pgd_offset(mm, pvmw->address); - if (!pgd_present(*pgd)) - return false; - p4d = p4d_offset(pgd, pvmw->address); - if (!p4d_present(*p4d)) - return false; - pud = pud_offset(p4d, pvmw->address); - if (!pud_present(*pud)) - return false; - pvmw->pmd = pmd_offset(pud, pvmw->address); - /* - * Make sure the pmd value isn't cached in a register by the - * compiler and used as a stale value after we've observed a - * subsequent update. - */ - pmde = READ_ONCE(*pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { - pvmw->ptl = pmd_lock(mm, pvmw->pmd); - pmde = *pvmw->pmd; - if (likely(pmd_trans_huge(pmde))) { - if (pvmw->flags & PVMW_MIGRATION) - return not_found(pvmw); - if (pmd_page(pmde) != page) - return not_found(pvmw); - return true; - } - if (!pmd_present(pmde)) { - swp_entry_t entry; + { + pgd = pgd_offset(mm, pvmw->address); + if (!pgd_present(*pgd)) + return false; + p4d = p4d_offset(pgd, pvmw->address); + if (!p4d_present(*p4d)) + return false; + pud = pud_offset(p4d, pvmw->address); + if (!pud_present(*pud)) + return false;
- if (!thp_migration_supported() || - !(pvmw->flags & PVMW_MIGRATION)) - return not_found(pvmw); - entry = pmd_to_swp_entry(pmde); - if (!is_migration_entry(entry) || - migration_entry_to_page(entry) != page) - return not_found(pvmw); - return true; - } - /* THP pmd was split under us: handle on pte level */ - spin_unlock(pvmw->ptl); - pvmw->ptl = NULL; - } else if (!pmd_present(pmde)) { + pvmw->pmd = pmd_offset(pud, pvmw->address); /* - * If PVMW_SYNC, take and drop THP pmd lock so that we - * cannot return prematurely, while zap_huge_pmd() has - * cleared *pmd but not decremented compound_mapcount(). + * Make sure the pmd value isn't cached in a register by the + * compiler and used as a stale value after we've observed a + * subsequent update. */ - if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) { - spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + pmde = READ_ONCE(*pvmw->pmd);
- spin_unlock(ptl); + if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { + pvmw->ptl = pmd_lock(mm, pvmw->pmd); + pmde = *pvmw->pmd; + if (likely(pmd_trans_huge(pmde))) { + if (pvmw->flags & PVMW_MIGRATION) + return not_found(pvmw); + if (pmd_page(pmde) != page) + return not_found(pvmw); + return true; + } + if (!pmd_present(pmde)) { + swp_entry_t entry; + + if (!thp_migration_supported() || + !(pvmw->flags & PVMW_MIGRATION)) + return not_found(pvmw); + entry = pmd_to_swp_entry(pmde); + if (!is_migration_entry(entry) || + migration_entry_to_page(entry) != page) + return not_found(pvmw); + return true; + } + /* THP pmd was split under us: handle on pte level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + } else if (!pmd_present(pmde)) { + /* + * If PVMW_SYNC, take and drop THP pmd lock so that we + * cannot return prematurely, while zap_huge_pmd() has + * cleared *pmd but not decremented compound_mapcount(). + */ + if ((pvmw->flags & PVMW_SYNC) && + PageTransCompound(page)) { + spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + + spin_unlock(ptl); + } + return false; } - return false; + if (!map_pte(pvmw)) + goto next_pte; } - if (!map_pte(pvmw)) - goto next_pte; while (1) { unsigned long end;
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit 7d82908ba4fb6ae809db8a5d63f85f1acc6d2946
--------------------------------
[ Upstream commit 474466301dfd8b39a10c01db740645f3f7ae9a28 ]
page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte, and use "goto this_pte", in place of the "while (1)" loop at the end.
Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 8199456782640..470f19c917ad5 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -139,6 +139,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) { struct mm_struct *mm = pvmw->vma->vm_mm; struct page *page = pvmw->page; + unsigned long end; pgd_t *pgd; p4d_t *p4d; pud_t *pud; @@ -229,10 +230,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } if (!map_pte(pvmw)) goto next_pte; - } - while (1) { - unsigned long end; - +this_pte: if (check_pte(pvmw)) return true; next_pte: @@ -261,6 +259,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pvmw->ptl = pte_lockptr(mm, pvmw->pmd); spin_lock(pvmw->ptl); } + goto this_pte; } }
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit b114408b1aa2b0957877632113009bdf5efc5da2
--------------------------------
[ Upstream commit a765c417d876cc635f628365ec9aa6f09470069a ]
page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start, rather than later at next_pte.
It's a little unnecessary overhead on the first call, but makes for a simpler loop in the following commit.
Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 470f19c917ad5..7239c38e99e25 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -167,6 +167,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return true; }
+ /* + * Seek to next pte only makes sense for THP. + * But more important than that optimization, is to filter out + * any PageKsm page: whose page->index misleads vma_address() + * and vma_address_end() to disaster. + */ + end = PageTransCompound(page) ? + vma_address_end(page, pvmw->vma) : + pvmw->address + PAGE_SIZE; if (pvmw->pte) goto next_pte; restart: @@ -234,10 +243,6 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (check_pte(pvmw)) return true; next_pte: - /* Seek to next pte only makes sense for THP */ - if (!PageTransHuge(page)) - return not_found(pvmw); - end = vma_address_end(page, pvmw->vma); do { pvmw->address += PAGE_SIZE; if (pvmw->address >= end)
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit 69784c9d5cb080a57062d944acb34c334758ff0a
--------------------------------
[ Upstream commit a9a7504d9beaf395481faa91e70e2fd08f7a3dde ]
Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
Crash dumps showed two tail pages of a shmem huge page remained mapped by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a long unmapped range; and no page table had yet been allocated for the head of the huge page to be mapped into.
Although designed to handle these odd misaligned huge-page-mapped-by-pte cases, page_vma_mapped_walk() falls short by returning false prematurely when !pmd_present or !pud_present or !p4d_present or !pgd_present: there are cases when a huge page may span the boundary, with ptes present in the next.
Restructure page_vma_mapped_walk() as a loop to continue in these cases, while keeping its layout much as before. Add a step_forward() helper to advance pvmw->address across those boundaries: originally I tried to use mm's standard p?d_addr_end() macros, but hit the same crash 512 times less often: because of the way redundant levels are folded together, but folded differently in different configurations, it was just too difficult to use them correctly; and step_forward() is simpler anyway.
Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 7239c38e99e25..b7a8009c1549f 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -111,6 +111,13 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return pfn_in_hpage(pvmw->page, pfn); }
+static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size) +{ + pvmw->address = (pvmw->address + size) & ~(size - 1); + if (!pvmw->address) + pvmw->address = ULONG_MAX; +} + /** * page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at * @pvmw->address @@ -179,16 +186,22 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pte) goto next_pte; restart: - { + do { pgd = pgd_offset(mm, pvmw->address); - if (!pgd_present(*pgd)) - return false; + if (!pgd_present(*pgd)) { + step_forward(pvmw, PGDIR_SIZE); + continue; + } p4d = p4d_offset(pgd, pvmw->address); - if (!p4d_present(*p4d)) - return false; + if (!p4d_present(*p4d)) { + step_forward(pvmw, P4D_SIZE); + continue; + } pud = pud_offset(p4d, pvmw->address); - if (!pud_present(*pud)) - return false; + if (!pud_present(*pud)) { + step_forward(pvmw, PUD_SIZE); + continue; + }
pvmw->pmd = pmd_offset(pud, pvmw->address); /* @@ -235,7 +248,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
spin_unlock(ptl); } - return false; + step_forward(pvmw, PMD_SIZE); + continue; } if (!map_pte(pvmw)) goto next_pte; @@ -265,7 +279,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) spin_lock(pvmw->ptl); } goto this_pte; - } + } while (pvmw->address < end); + + return false; }
/**
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit e943b4373cf706ee8ee433988bc0c4d6e3ea5907
--------------------------------
[ Upstream commit a7a69d8ba88d8dcee7ef00e91d413a4bd003a814 ]
Aha! Shouldn't that quick scan over pte_none()s make sure that it holds ptlock in the PVMW_SYNC case? That too might have been responsible for BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though I've never seen any.
Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Tested-by: Wang Yugui wangyugui@e16-tech.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/page_vma_mapped.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index b7a8009c1549f..edca786093187 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -272,6 +272,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) goto restart; } pvmw->pte++; + if ((pvmw->flags & PVMW_SYNC) && !pvmw->ptl) { + pvmw->ptl = pte_lockptr(mm, pvmw->pmd); + spin_lock(pvmw->ptl); + } } while (pte_none(*pvmw->pte));
if (!pvmw->ptl) {
From: Hugh Dickins hughd@google.com
stable inclusion from linux-4.19.197 commit 2445837e9cd084ba849a7c1c70086a6cdc608f48
--------------------------------
[ Upstream commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 ]
If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong.
When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages.
page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head.
Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it.
[akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope]
Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Reported-by: Neel Natu neelnatu@google.com Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Acked-by: Thomas Gleixner tglx@linutronix.de Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Zhang Yi wetpzy@gmail.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Darren Hart dvhart@infradead.org Cc: Davidlohr Bueso dave@stgolabs.net Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: leave redundant #include <linux/hugetlb.h> in kernel/futex.c, to avoid conflict over the header files included.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/hugetlb.h | 16 ---------------- include/linux/pagemap.h | 13 +++++++------ kernel/futex.c | 2 +- mm/hugetlb.c | 5 +---- 4 files changed, 9 insertions(+), 27 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index affc3fb925860..407fc22a11d9e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -513,17 +513,6 @@ static inline int hstate_index(struct hstate *h) return h - hstates; }
-pgoff_t __basepage_index(struct page *page); - -/* Return page->index in PAGE_SIZE units */ -static inline pgoff_t basepage_index(struct page *page) -{ - if (!PageCompound(page)) - return page->index; - - return __basepage_index(page); -} - extern int dissolve_free_huge_page(struct page *page); extern int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn); @@ -618,11 +607,6 @@ static inline int hstate_index(struct hstate *h) return 0; }
-static inline pgoff_t basepage_index(struct page *page) -{ - return page->index; -} - static inline int dissolve_free_huge_page(struct page *page) { return 0; diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index e889d99296159..085aed892ce58 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -427,7 +427,7 @@ static inline struct page *read_mapping_page(struct address_space *mapping, }
/* - * Get index of the page with in radix-tree + * Get index of the page within radix-tree (but not for hugetlb pages). * (TODO: remove once hugetlb pages will have ->index in PAGE_SIZE) */ static inline pgoff_t page_to_index(struct page *page) @@ -446,15 +446,16 @@ static inline pgoff_t page_to_index(struct page *page) return pgoff; }
+extern pgoff_t hugetlb_basepage_index(struct page *page); + /* - * Get the offset in PAGE_SIZE. - * (TODO: hugepage should have ->index in PAGE_SIZE) + * Get the offset in PAGE_SIZE (even for hugetlb pages). + * (TODO: hugetlb pages should have ->index in PAGE_SIZE) */ static inline pgoff_t page_to_pgoff(struct page *page) { - if (unlikely(PageHeadHuge(page))) - return page->index << compound_order(page); - + if (unlikely(PageHuge(page))) + return hugetlb_basepage_index(page); return page_to_index(page); }
diff --git a/kernel/futex.c b/kernel/futex.c index a88c2cb759cd0..8bdab9c3f1954 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -724,7 +724,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
key->both.offset |= FUT_OFF_INODE; /* inode-based key */ key->shared.i_seq = get_inode_sequence_number(inode); - key->shared.pgoff = basepage_index(tail); + key->shared.pgoff = page_to_pgoff(tail); rcu_read_unlock(); }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a30922c297aeb..8ab6a9903ec9a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1410,15 +1410,12 @@ int PageHeadHuge(struct page *page_head) return get_compound_page_dtor(page_head) == free_huge_page; }
-pgoff_t __basepage_index(struct page *page) +pgoff_t hugetlb_basepage_index(struct page *page) { struct page *page_head = compound_head(page); pgoff_t index = page_index(page_head); unsigned long compound_idx;
- if (!PageHuge(page_head)) - return page_index(page); - if (compound_order(page_head) >= MAX_ORDER) compound_idx = page_to_pfn(page) - page_to_pfn(page_head); else
From: ManYi Li limanyi@uniontech.com
stable inclusion from linux-4.19.197 commit b52a404434a93d83b243187c6cf135dd6ba529c8
--------------------------------
[ Upstream commit 7dd753ca59d6c8cc09aa1ed24f7657524803c7f3 ]
Handle a reported media event code of 3. This indicates that the media has been removed from the drive and user intervention is required to proceed. Return DISK_EVENT_EJECT_REQUEST in that case.
Link: https://lore.kernel.org/r/20210611094402.23884-1-limanyi@uniontech.com Signed-off-by: ManYi Li limanyi@uniontech.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/scsi/sr.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 45c8bf39ad238..acf0c244141f4 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -216,6 +216,8 @@ static unsigned int sr_get_events(struct scsi_device *sdev) return DISK_EVENT_EJECT_REQUEST; else if (med->media_event_code == 2) return DISK_EVENT_MEDIA_CHANGE; + else if (med->media_event_code == 3) + return DISK_EVENT_EJECT_REQUEST; return 0; }
From: Christian König christian.koenig@amd.com
stable inclusion from linux-4.19.197 commit 6dcca74b358a849c55ab2db41ae8f97e08ffb8e0
--------------------------------
[ Upstream commit d330099115597bbc238d6758a4930e72b49ea9ba ]
AGP for example doesn't have a dma_address array.
Signed-off-by: Christian König christian.koenig@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20210614110517.1624-1-christia... Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7214022dfb911..d230536e7086d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -512,7 +512,7 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo) struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm; int i;
- if (!ttm_dma) + if (!ttm_dma || !ttm_dma->dma_address) return;
/* Don't waste time looping if the object is coherent */ @@ -532,7 +532,7 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo) struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm; int i;
- if (!ttm_dma) + if (!ttm_dma || !ttm_dma->dma_address) return;
/* Don't waste time looping if the object is coherent */
From: Anson Huang Anson.Huang@nxp.com
stable inclusion from linux-4.19.197 commit 4ca30ef6257a729fdb9d42a80d72984dd332bfd6
--------------------------------
commit 4521de30fbb3f5be0db58de93582ebce72c9d44f upstream.
The vdd3p0 LDO's input should be from external USB VBUS directly, NOT PMIC's power supply, the vdd3p0 LDO's target output voltage can be controlled by SW, and it requires input voltage to be high enough, with incorrect power supply assigned, if the power supply's voltage is lower than the LDO target output voltage, it will return fail and skip the LDO voltage adjustment, so remove the power supply assignment for vdd3p0 to avoid such scenario.
Fixes: 93385546ba36 ("ARM: dts: imx6qdl-sabresd: Assign corresponding power supply for LDOs") Signed-off-by: Anson Huang Anson.Huang@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Nobuhiro Iwamatsu (CIP) nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi index 41384bbd2f60c..03357d39870ee 100644 --- a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi +++ b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi @@ -675,10 +675,6 @@ vin-supply = <&vgen5_reg>; };
-®_vdd3p0 { - vin-supply = <&sw2_reg>; -}; - ®_vdd2p5 { vin-supply = <&vgen5_reg>; };
From: Petr Mladek pmladek@suse.com
stable inclusion from linux-4.19.197 commit 13bcf5aeb33caa283c6b03e14b7a254e1223c0d7
--------------------------------
commit 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 upstream.
Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()".
This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling.
This patch (of 2):
Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync().
It does not modify the existing behavior.
Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek pmladek@suse.com Cc: jenhaochen@google.com Cc: Martin Liu liumartin@google.com Cc: Minchan Kim minchan@google.com Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Oleg Nesterov oleg@redhat.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/kthread.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c index 80099238f21da..1b7fdfdb096cc 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1020,6 +1020,33 @@ void kthread_flush_work(struct kthread_work *work) } EXPORT_SYMBOL_GPL(kthread_flush_work);
+/* + * Make sure that the timer is neither set nor running and could + * not manipulate the work list_head any longer. + * + * The function is called under worker->lock. The lock is temporary + * released but the timer can't be set again in the meantime. + */ +static void kthread_cancel_delayed_work_timer(struct kthread_work *work, + unsigned long *flags) +{ + struct kthread_delayed_work *dwork = + container_of(work, struct kthread_delayed_work, work); + struct kthread_worker *worker = work->worker; + + /* + * del_timer_sync() must be called to make sure that the timer + * callback is not running. The lock must be temporary released + * to avoid a deadlock with the callback. In the meantime, + * any queuing is blocked by setting the canceling counter. + */ + work->canceling++; + spin_unlock_irqrestore(&worker->lock, *flags); + del_timer_sync(&dwork->timer); + spin_lock_irqsave(&worker->lock, *flags); + work->canceling--; +} + /* * This function removes the work from the worker queue. Also it makes sure * that it won't get queued later via the delayed work's timer. @@ -1034,23 +1061,8 @@ static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork, unsigned long *flags) { /* Try to cancel the timer if exists. */ - if (is_dwork) { - struct kthread_delayed_work *dwork = - container_of(work, struct kthread_delayed_work, work); - struct kthread_worker *worker = work->worker; - - /* - * del_timer_sync() must be called to make sure that the timer - * callback is not running. The lock must be temporary released - * to avoid a deadlock with the callback. In the meantime, - * any queuing is blocked by setting the canceling counter. - */ - work->canceling++; - spin_unlock_irqrestore(&worker->lock, *flags); - del_timer_sync(&dwork->timer); - spin_lock_irqsave(&worker->lock, *flags); - work->canceling--; - } + if (is_dwork) + kthread_cancel_delayed_work_timer(work, flags);
/* * Try to remove the work from a worker list. It might either
From: Petr Mladek pmladek@suse.com
stable inclusion from linux-4.19.197 commit 6e2a98bc902ce347747fabb133485d7ce2bdd7e4
--------------------------------
commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream.
The system might hang with the following backtrace:
schedule+0x80/0x100 schedule_timeout+0x48/0x138 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c kthread_flush_work+0x114/0x1cc kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144 kthread_cancel_delayed_work_sync+0x18/0x2c xxxx_pm_notify+0xb0/0xd8 blocking_notifier_call_chain_robust+0x80/0x194 pm_notifier_call_chain_robust+0x28/0x4c suspend_prepare+0x40/0x260 enter_state+0x80/0x3f4 pm_suspend+0x60/0xdc state_store+0x108/0x144 kobj_attr_store+0x38/0x88 sysfs_kf_write+0x64/0xc0 kernfs_fop_write_iter+0x108/0x1d0 vfs_write+0x2f4/0x368 ksys_write+0x7c/0xec
It is caused by the following race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync():
CPU0 CPU1
Context: Thread A Context: Thread B
kthread_mod_delayed_work() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() kthread_cancel_delayed_work_sync() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() spin_lock()
work->canceling++ spin_unlock spin_lock() queue_delayed_work() // dwork is put into the worker->delayed_work_list
spin_unlock()
kthread_flush_work() // flush_work is put at the tail of the dwork
wait_for_completion()
Context: IRQ
kthread_delayed_work_timer_fn() spin_lock() list_del_init(&work->node); spin_unlock()
BANG: flush_work is not longer linked and will never get proceed.
The problem is that kthread_mod_delayed_work() checks work->canceling flag before canceling the timer.
A simple solution is to (re)check work->canceling after __kthread_cancel_work(). But then it is not clear what should be returned when __kthread_cancel_work() removed the work from the queue (list) and it can't queue it again with the new @delay.
The return value might be used for reference counting. The caller has to know whether a new work has been queued or an existing one was replaced.
The proper solution is that kthread_mod_delayed_work() will remove the work from the queue (list) _only_ when work->canceling is not set. The flag must be checked after the timer is stopped and the remaining operations can be done under worker->lock.
Note that kthread_mod_delayed_work() could remove the timer and then bail out. It is fine. The other canceling caller needs to cancel the timer as well. The important thing is that the queue (list) manipulation is done atomically under worker->lock.
Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com Fixes: 9a6b06c8d9a220860468a ("kthread: allow to modify delayed kthread work") Signed-off-by: Petr Mladek pmladek@suse.com Reported-by: Martin Liu liumartin@google.com Cc: jenhaochen@google.com Cc: Minchan Kim minchan@google.com Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Oleg Nesterov oleg@redhat.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- kernel/kthread.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c index 1b7fdfdb096cc..fee5cc055791b 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1048,8 +1048,11 @@ static void kthread_cancel_delayed_work_timer(struct kthread_work *work, }
/* - * This function removes the work from the worker queue. Also it makes sure - * that it won't get queued later via the delayed work's timer. + * This function removes the work from the worker queue. + * + * It is called under worker->lock. The caller must make sure that + * the timer used by delayed work is not running, e.g. by calling + * kthread_cancel_delayed_work_timer(). * * The work might still be in use when this function finishes. See the * current_work proceed by the worker. @@ -1057,13 +1060,8 @@ static void kthread_cancel_delayed_work_timer(struct kthread_work *work, * Return: %true if @work was pending and successfully canceled, * %false if @work was not pending */ -static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork, - unsigned long *flags) +static bool __kthread_cancel_work(struct kthread_work *work) { - /* Try to cancel the timer if exists. */ - if (is_dwork) - kthread_cancel_delayed_work_timer(work, flags); - /* * Try to remove the work from a worker list. It might either * be from worker->work_list or from worker->delayed_work_list. @@ -1116,11 +1114,23 @@ bool kthread_mod_delayed_work(struct kthread_worker *worker, /* Work must not be used with >1 worker, see kthread_queue_work() */ WARN_ON_ONCE(work->worker != worker);
- /* Do not fight with another command that is canceling this work. */ + /* + * Temporary cancel the work but do not fight with another command + * that is canceling the work as well. + * + * It is a bit tricky because of possible races with another + * mod_delayed_work() and cancel_delayed_work() callers. + * + * The timer must be canceled first because worker->lock is released + * when doing so. But the work can be removed from the queue (list) + * only when it can be queued again so that the return value can + * be used for reference counting. + */ + kthread_cancel_delayed_work_timer(work, &flags); if (work->canceling) goto out; + ret = __kthread_cancel_work(work);
- ret = __kthread_cancel_work(work, true, &flags); fast_queue: __kthread_queue_delayed_work(worker, dwork, delay); out: @@ -1142,7 +1152,10 @@ static bool __kthread_cancel_work_sync(struct kthread_work *work, bool is_dwork) /* Work must not be used with >1 worker, see kthread_queue_work(). */ WARN_ON_ONCE(work->worker != worker);
- ret = __kthread_cancel_work(work, is_dwork, &flags); + if (is_dwork) + kthread_cancel_delayed_work_timer(work, &flags); + + ret = __kthread_cancel_work(work);
if (worker->current_work != work) goto out_fast;
From: Juergen Gross jgross@suse.com
stable inclusion from linux-4.19.197 commit cda326e5033fc1e7912ca31887b05fa7cd8060f3
--------------------------------
commit 3de218ff39b9e3f0d453fe3154f12a174de44b25 upstream.
In order to avoid a race condition for user events when changing cpu affinity reset the active flag only when EOI-ing the event.
This is working fine as all user events are lateeoi events. Note that lateeoi_ack_mask_dynirq() is not modified as there is no explicit call to xen_irq_lateeoi() expected later.
Cc: stable@vger.kernel.org Reported-by: Julien Grall julien@xen.org Fixes: b6622798bc50b62 ("xen/events: avoid handling the same event on two cpus at the same time") Tested-by: Julien Grall julien@xen.org Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Boris Ostrovsky boris.ostrvsky@oracle.com Link: https://lore.kernel.org/r/20210623130913.9405-1-jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/xen/events/events_base.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index 409c761919fe5..914a07d4238e2 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -511,6 +511,9 @@ static void xen_irq_lateeoi_locked(struct irq_info *info, bool spurious) }
info->eoi_time = 0; + + /* is_active hasn't been reset yet, do it now. */ + smp_store_release(&info->is_active, 0); do_unmask(info, EVT_MASK_REASON_EOI_PENDING); }
@@ -1767,10 +1770,22 @@ static void lateeoi_ack_dynirq(struct irq_data *data) struct irq_info *info = info_for_irq(data->irq); evtchn_port_t evtchn = info ? info->evtchn : 0;
- if (VALID_EVTCHN(evtchn)) { - do_mask(info, EVT_MASK_REASON_EOI_PENDING); - ack_dynirq(data); - } + if (!VALID_EVTCHN(evtchn)) + return; + + do_mask(info, EVT_MASK_REASON_EOI_PENDING); + + if (unlikely(irqd_is_setaffinity_pending(data)) && + likely(!irqd_irq_disabled(data))) { + do_mask(info, EVT_MASK_REASON_TEMPORARY); + + clear_evtchn(evtchn); + + irq_move_masked_irq(data); + + do_unmask(info, EVT_MASK_REASON_TEMPORARY); + } else + clear_evtchn(evtchn); }
static void lateeoi_mask_ack_dynirq(struct irq_data *data)
From: Alper Gun alpergun@google.com
stable inclusion from linux-4.19.197 commit 4686e3e4aa78e39721f49d6a2719c61b444c7525
--------------------------------
commit 934002cd660b035b926438244b4294e647507e13 upstream.
Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding fails. If a failure happens after a successful LAUNCH_START command, a decommission command should be executed. Otherwise, guest context will be unfreed inside the AMD SP. After the firmware will not have memory to allocate more SEV guest context, LAUNCH_START command will begin to fail with SEV_RET_RESOURCE_LIMIT error.
The existing code calls decommission inside sev_unbind_asid, but it is not called if a failure happens before guest activation succeeds. If sev_bind_asid fails, decommission is never called. PSP firmware has a limit for the number of guests. If sev_asid_binding fails many times, PSP firmware will not have resources to create another guest context.
Cc: stable@vger.kernel.org Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Reported-by: Peter Gonda pgonda@google.com Signed-off-by: Alper Gun alpergun@google.com Reviewed-by: Marc Orr marcorr@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Message-Id: 20210610174604.2554090-1-alpergun@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/x86/kvm/svm.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 111edec24243b..b3aa7cbb10eee 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1791,9 +1791,25 @@ static void sev_asid_free(struct kvm *kvm) __sev_asid_free(sev->asid); }
-static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) +static void sev_decommission(unsigned int handle) { struct sev_data_decommission *decommission; + + if (!handle) + return; + + decommission = kzalloc(sizeof(*decommission), GFP_KERNEL); + if (!decommission) + return; + + decommission->handle = handle; + sev_guest_decommission(decommission, NULL); + + kfree(decommission); +} + +static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) +{ struct sev_data_deactivate *data;
if (!handle) @@ -1811,15 +1827,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) sev_guest_df_flush(NULL); kfree(data);
- decommission = kzalloc(sizeof(*decommission), GFP_KERNEL); - if (!decommission) - return; - - /* decommission handle */ - decommission->handle = handle; - sev_guest_decommission(decommission, NULL); - - kfree(decommission); + sev_decommission(handle); }
static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, @@ -6469,8 +6477,10 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
/* Bind ASID to this guest */ ret = sev_bind_asid(kvm, start->handle, error); - if (ret) + if (ret) { + sev_decommission(start->handle); goto e_free_session; + }
/* return handle to userspace */ params.handle = start->handle;
From: afzal mohammed afzal.mohd.ma@gmail.com
stable inclusion from linux-4.19.197 commit c74081df081c306d9d256d9026181b67f595fdf6
--------------------------------
commit b75ca5217743e4d7076cf65e044e88389e44318d upstream.
request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready.
Per tglx[1], setup_irq() existed in olden days when allocators were not ready by the time early interrupts were initialized.
Hence replace setup_irq() by request_irq().
[1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org Signed-off-by: afzal mohammed afzal.mohd.ma@gmail.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/mach-omap1/pm.c | 13 ++++++------- arch/arm/mach-omap1/time.c | 10 +++------- arch/arm/mach-omap1/timer32k.c | 10 +++------- arch/arm/mach-omap2/timer.c | 11 +++-------- 4 files changed, 15 insertions(+), 29 deletions(-)
diff --git a/arch/arm/mach-omap1/pm.c b/arch/arm/mach-omap1/pm.c index 3e1de14805e48..65b91a8379c90 100644 --- a/arch/arm/mach-omap1/pm.c +++ b/arch/arm/mach-omap1/pm.c @@ -610,11 +610,6 @@ static irqreturn_t omap_wakeup_interrupt(int irq, void *dev) return IRQ_HANDLED; }
-static struct irqaction omap_wakeup_irq = { - .name = "peripheral wakeup", - .handler = omap_wakeup_interrupt -}; -
static const struct platform_suspend_ops omap_pm_ops = { @@ -627,6 +622,7 @@ static const struct platform_suspend_ops omap_pm_ops = { static int __init omap_pm_init(void) { int error = 0; + int irq;
if (!cpu_class_is_omap1()) return -ENODEV; @@ -670,9 +666,12 @@ static int __init omap_pm_init(void) arm_pm_idle = omap1_pm_idle;
if (cpu_is_omap7xx()) - setup_irq(INT_7XX_WAKE_UP_REQ, &omap_wakeup_irq); + irq = INT_7XX_WAKE_UP_REQ; else if (cpu_is_omap16xx()) - setup_irq(INT_1610_WAKE_UP_REQ, &omap_wakeup_irq); + irq = INT_1610_WAKE_UP_REQ; + if (request_irq(irq, omap_wakeup_interrupt, 0, "peripheral wakeup", + NULL)) + pr_err("Failed to request irq %d (peripheral wakeup)\n", irq);
/* Program new power ramp-up time * (0 for most boards since we don't lower voltage when in deep sleep) diff --git a/arch/arm/mach-omap1/time.c b/arch/arm/mach-omap1/time.c index 524977a31a49c..de590a85a42b3 100644 --- a/arch/arm/mach-omap1/time.c +++ b/arch/arm/mach-omap1/time.c @@ -155,15 +155,11 @@ static irqreturn_t omap_mpu_timer1_interrupt(int irq, void *dev_id) return IRQ_HANDLED; }
-static struct irqaction omap_mpu_timer1_irq = { - .name = "mpu_timer1", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap_mpu_timer1_interrupt, -}; - static __init void omap_init_mpu_timer(unsigned long rate) { - setup_irq(INT_TIMER1, &omap_mpu_timer1_irq); + if (request_irq(INT_TIMER1, omap_mpu_timer1_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "mpu_timer1", NULL)) + pr_err("Failed to request irq %d (mpu_timer1)\n", INT_TIMER1); omap_mpu_timer_start(0, (rate / HZ) - 1, 1);
clockevent_mpu_timer1.cpumask = cpumask_of(0); diff --git a/arch/arm/mach-omap1/timer32k.c b/arch/arm/mach-omap1/timer32k.c index 0ae6c52a7d70b..780fdf03c3cee 100644 --- a/arch/arm/mach-omap1/timer32k.c +++ b/arch/arm/mach-omap1/timer32k.c @@ -148,15 +148,11 @@ static irqreturn_t omap_32k_timer_interrupt(int irq, void *dev_id) return IRQ_HANDLED; }
-static struct irqaction omap_32k_timer_irq = { - .name = "32KHz timer", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap_32k_timer_interrupt, -}; - static __init void omap_init_32k_timer(void) { - setup_irq(INT_OS_TIMER, &omap_32k_timer_irq); + if (request_irq(INT_OS_TIMER, omap_32k_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "32KHz timer", NULL)) + pr_err("Failed to request irq %d(32KHz timer)\n", INT_OS_TIMER);
clockevent_32k_timer.cpumask = cpumask_of(0); clockevents_config_and_register(&clockevent_32k_timer, diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c index 98ed5ac073bc1..4fff1079f7c28 100644 --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -92,12 +92,6 @@ static irqreturn_t omap2_gp_timer_interrupt(int irq, void *dev_id) return IRQ_HANDLED; }
-static struct irqaction omap2_gp_timer_irq = { - .name = "gp_timer", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap2_gp_timer_interrupt, -}; - static int omap2_gp_timer_set_next_event(unsigned long cycles, struct clock_event_device *evt) { @@ -383,8 +377,9 @@ static void __init omap2_gp_clockevent_init(int gptimer_id, &clockevent_gpt.name, OMAP_TIMER_POSTED); BUG_ON(res);
- omap2_gp_timer_irq.dev_id = &clkev; - setup_irq(clkev.irq, &omap2_gp_timer_irq); + if (request_irq(clkev.irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", &clkev)) + pr_err("Failed to request irq %d (gp_timer)\n", clkev.irq);
__omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW);
From: Tony Lindgren tony@atomide.com
stable inclusion from linux-4.19.197 commit 78130e2e4e405d51f74cfebec47dd43ec6164277
--------------------------------
commit 52762fbd1c4778ac9b173624ca0faacd22ef4724 upstream.
We can move the TI dmtimer clockevent and clocksource to live under drivers/clocksource if we rely only on the clock framework, and handle the module configuration directly in the clocksource driver based on the device tree data.
This removes the early dependency with system timers to the interconnect related code, and we can probe pretty much everything else later on at the module_init level.
Let's first add a new driver for timer-ti-dm-systimer based on existing arch/arm/mach-omap2/timer.c. Then let's start moving SoCs to probe with device tree data while still keeping the old timer.c. And eventually we can just drop the old timer.c.
Let's take the opportunity to switch to use readl/writel as pointed out by Daniel Lezcano daniel.lezcano@linaro.org. This allows further clean-up of the timer-ti-dm code the a lot of the shared helpers can just become static to the non-syster related code.
Note the boards can optionally configure different timer source clocks if needed with assigned-clocks and assigned-clock-parents.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/mach-omap2/timer.c | 105 +++++++++++++++++++++--------------- 1 file changed, 63 insertions(+), 42 deletions(-)
diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c index 4fff1079f7c28..0c0ee29751a84 100644 --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -64,15 +64,28 @@
/* Clockevent code */
-static struct omap_dm_timer clkev; -static struct clock_event_device clockevent_gpt; - /* Clockevent hwmod for am335x and am437x suspend */ static struct omap_hwmod *clockevent_gpt_hwmod;
/* Clockesource hwmod for am437x suspend */ static struct omap_hwmod *clocksource_gpt_hwmod;
+struct dmtimer_clockevent { + struct clock_event_device dev; + struct omap_dm_timer timer; +}; + +static struct dmtimer_clockevent clockevent; + +static struct omap_dm_timer *to_dmtimer(struct clock_event_device *clockevent) +{ + struct dmtimer_clockevent *clkevt = + container_of(clockevent, struct dmtimer_clockevent, dev); + struct omap_dm_timer *timer = &clkevt->timer; + + return timer; +} + #ifdef CONFIG_SOC_HAS_REALTIME_COUNTER static unsigned long arch_timer_freq;
@@ -84,10 +97,11 @@ void set_cntfreq(void)
static irqreturn_t omap2_gp_timer_interrupt(int irq, void *dev_id) { - struct clock_event_device *evt = &clockevent_gpt; - - __omap_dm_timer_write_status(&clkev, OMAP_TIMER_INT_OVERFLOW); + struct dmtimer_clockevent *clkevt = dev_id; + struct clock_event_device *evt = &clkevt->dev; + struct omap_dm_timer *timer = &clkevt->timer;
+ __omap_dm_timer_write_status(timer, OMAP_TIMER_INT_OVERFLOW); evt->event_handler(evt); return IRQ_HANDLED; } @@ -95,7 +109,9 @@ static irqreturn_t omap2_gp_timer_interrupt(int irq, void *dev_id) static int omap2_gp_timer_set_next_event(unsigned long cycles, struct clock_event_device *evt) { - __omap_dm_timer_load_start(&clkev, OMAP_TIMER_CTRL_ST, + struct omap_dm_timer *timer = to_dmtimer(evt); + + __omap_dm_timer_load_start(timer, OMAP_TIMER_CTRL_ST, 0xffffffff - cycles, OMAP_TIMER_POSTED);
return 0; @@ -103,22 +119,26 @@ static int omap2_gp_timer_set_next_event(unsigned long cycles,
static int omap2_gp_timer_shutdown(struct clock_event_device *evt) { - __omap_dm_timer_stop(&clkev, OMAP_TIMER_POSTED, clkev.rate); + struct omap_dm_timer *timer = to_dmtimer(evt); + + __omap_dm_timer_stop(timer, OMAP_TIMER_POSTED, timer->rate); + return 0; }
static int omap2_gp_timer_set_periodic(struct clock_event_device *evt) { + struct omap_dm_timer *timer = to_dmtimer(evt); u32 period;
- __omap_dm_timer_stop(&clkev, OMAP_TIMER_POSTED, clkev.rate); + __omap_dm_timer_stop(timer, OMAP_TIMER_POSTED, timer->rate);
- period = clkev.rate / HZ; + period = timer->rate / HZ; period -= 1; /* Looks like we need to first set the load value separately */ - __omap_dm_timer_write(&clkev, OMAP_TIMER_LOAD_REG, 0xffffffff - period, + __omap_dm_timer_write(timer, OMAP_TIMER_LOAD_REG, 0xffffffff - period, OMAP_TIMER_POSTED); - __omap_dm_timer_load_start(&clkev, + __omap_dm_timer_load_start(timer, OMAP_TIMER_CTRL_AR | OMAP_TIMER_CTRL_ST, 0xffffffff - period, OMAP_TIMER_POSTED); return 0; @@ -132,26 +152,17 @@ static void omap_clkevt_idle(struct clock_event_device *unused) omap_hwmod_idle(clockevent_gpt_hwmod); }
-static void omap_clkevt_unidle(struct clock_event_device *unused) +static void omap_clkevt_unidle(struct clock_event_device *evt) { + struct omap_dm_timer *timer = to_dmtimer(evt); + if (!clockevent_gpt_hwmod) return;
omap_hwmod_enable(clockevent_gpt_hwmod); - __omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW); + __omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW); }
-static struct clock_event_device clockevent_gpt = { - .features = CLOCK_EVT_FEAT_PERIODIC | - CLOCK_EVT_FEAT_ONESHOT, - .rating = 300, - .set_next_event = omap2_gp_timer_set_next_event, - .set_state_shutdown = omap2_gp_timer_shutdown, - .set_state_periodic = omap2_gp_timer_set_periodic, - .set_state_oneshot = omap2_gp_timer_shutdown, - .tick_resume = omap2_gp_timer_shutdown, -}; - static const struct of_device_id omap_timer_match[] __initconst = { { .compatible = "ti,omap2420-timer", }, { .compatible = "ti,omap3430-timer", }, @@ -361,44 +372,54 @@ static void __init omap2_gp_clockevent_init(int gptimer_id, const char *fck_source, const char *property) { + struct dmtimer_clockevent *clkevt = &clockevent; + struct omap_dm_timer *timer = &clkevt->timer; int res;
- clkev.id = gptimer_id; - clkev.errata = omap_dm_timer_get_errata(); + timer->id = gptimer_id; + timer->errata = omap_dm_timer_get_errata(); + clkevt->dev.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT; + clkevt->dev.rating = 300; + clkevt->dev.set_next_event = omap2_gp_timer_set_next_event; + clkevt->dev.set_state_shutdown = omap2_gp_timer_shutdown; + clkevt->dev.set_state_periodic = omap2_gp_timer_set_periodic; + clkevt->dev.set_state_oneshot = omap2_gp_timer_shutdown; + clkevt->dev.tick_resume = omap2_gp_timer_shutdown;
/* * For clock-event timers we never read the timer counter and * so we are not impacted by errata i103 and i767. Therefore, * we can safely ignore this errata for clock-event timers. */ - __omap_dm_timer_override_errata(&clkev, OMAP_TIMER_ERRATA_I103_I767); + __omap_dm_timer_override_errata(timer, OMAP_TIMER_ERRATA_I103_I767);
- res = omap_dm_timer_init_one(&clkev, fck_source, property, - &clockevent_gpt.name, OMAP_TIMER_POSTED); + res = omap_dm_timer_init_one(timer, fck_source, property, + &clkevt->dev.name, OMAP_TIMER_POSTED); BUG_ON(res);
- if (request_irq(clkev.irq, omap2_gp_timer_interrupt, - IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", &clkev)) - pr_err("Failed to request irq %d (gp_timer)\n", clkev.irq); + clkevt->dev.cpumask = cpu_possible_mask; + clkevt->dev.irq = omap_dm_timer_get_irq(timer); + + if (request_irq(timer->irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", clkevt)) + pr_err("Failed to request irq %d (gp_timer)\n", timer->irq);
- __omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW); + __omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW);
- clockevent_gpt.cpumask = cpu_possible_mask; - clockevent_gpt.irq = omap_dm_timer_get_irq(&clkev); - clockevents_config_and_register(&clockevent_gpt, clkev.rate, + clockevents_config_and_register(&clkevt->dev, timer->rate, 3, /* Timer internal resynch latency */ 0xffffffff);
if (soc_is_am33xx() || soc_is_am43xx()) { - clockevent_gpt.suspend = omap_clkevt_idle; - clockevent_gpt.resume = omap_clkevt_unidle; + clkevt->dev.suspend = omap_clkevt_idle; + clkevt->dev.resume = omap_clkevt_unidle;
clockevent_gpt_hwmod = - omap_hwmod_lookup(clockevent_gpt.name); + omap_hwmod_lookup(clkevt->dev.name); }
- pr_info("OMAP clockevent source: %s at %lu Hz\n", clockevent_gpt.name, - clkev.rate); + pr_info("OMAP clockevent source: %s at %lu Hz\n", clkevt->dev.name, + timer->rate); }
/* Clocksource code */
From: Tony Lindgren tony@atomide.com
stable inclusion from linux-4.19.197 commit 123235223169d1a73e55fcd5d857be676207021f
--------------------------------
commit 3efe7a878a11c13b5297057bfc1e5639ce1241ce upstream.
There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days.
To work around the issue, we need to use timer-ti-dm timers instead.
Let's prepare for adding support for percpu timers by adding a common dmtimer_clkevt_init_common() and call it from __omap_sync32k_timer_init(). This patch makes no intentional functional changes.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/mach-omap2/timer.c | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c index 0c0ee29751a84..f24858d5ac693 100644 --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -368,18 +368,21 @@ void tick_broadcast(const struct cpumask *mask) } #endif
-static void __init omap2_gp_clockevent_init(int gptimer_id, - const char *fck_source, - const char *property) +static void __init dmtimer_clkevt_init_common(struct dmtimer_clockevent *clkevt, + int gptimer_id, + const char *fck_source, + unsigned int features, + const struct cpumask *cpumask, + const char *property, + int rating, const char *name) { - struct dmtimer_clockevent *clkevt = &clockevent; struct omap_dm_timer *timer = &clkevt->timer; int res;
timer->id = gptimer_id; timer->errata = omap_dm_timer_get_errata(); - clkevt->dev.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT; - clkevt->dev.rating = 300; + clkevt->dev.features = features; + clkevt->dev.rating = rating; clkevt->dev.set_next_event = omap2_gp_timer_set_next_event; clkevt->dev.set_state_shutdown = omap2_gp_timer_shutdown; clkevt->dev.set_state_periodic = omap2_gp_timer_set_periodic; @@ -397,19 +400,15 @@ static void __init omap2_gp_clockevent_init(int gptimer_id, &clkevt->dev.name, OMAP_TIMER_POSTED); BUG_ON(res);
- clkevt->dev.cpumask = cpu_possible_mask; + clkevt->dev.cpumask = cpumask; clkevt->dev.irq = omap_dm_timer_get_irq(timer);
- if (request_irq(timer->irq, omap2_gp_timer_interrupt, - IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", clkevt)) - pr_err("Failed to request irq %d (gp_timer)\n", timer->irq); + if (request_irq(clkevt->dev.irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, name, clkevt)) + pr_err("Failed to request irq %d (gp_timer)\n", clkevt->dev.irq);
__omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW);
- clockevents_config_and_register(&clkevt->dev, timer->rate, - 3, /* Timer internal resynch latency */ - 0xffffffff); - if (soc_is_am33xx() || soc_is_am43xx()) { clkevt->dev.suspend = omap_clkevt_idle; clkevt->dev.resume = omap_clkevt_unidle; @@ -559,7 +558,12 @@ static void __init __omap_sync32k_timer_init(int clkev_nr, const char *clkev_src { omap_clk_init(); omap_dmtimer_init(); - omap2_gp_clockevent_init(clkev_nr, clkev_src, clkev_prop); + dmtimer_clkevt_init_common(&clockevent, clkev_nr, clkev_src, + CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, + cpu_possible_mask, clkev_prop, 300, "clockevent"); + clockevents_config_and_register(&clockevent.dev, clockevent.timer.rate, + 3, /* Timer internal resynch latency */ + 0xffffffff);
/* Enable the use of clocksource="gp_timer" kernel parameter */ if (use_gptimer_clksrc || gptimer)
From: Tony Lindgren tony@atomide.com
stable inclusion from linux-4.19.197 commit 330584716d4b5b9653608edc0adf7251bcd7e506
--------------------------------
commit 25de4ce5ed02994aea8bc111d133308f6fd62566 upstream.
There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days.
To work around the issue, we need to use timer-ti-dm percpu timers instead.
Let's configure dmtimer3 and 4 as percpu timers by default, and warn about the issue if the dtb is not configured properly.
For more information, please see the errata for "AM572x Sitara Processors Silicon Revisions 1.1, 2.0":
https://www.ti.com/lit/er/sprz429m/sprz429m.pdf
The concept is based on earlier reference patches done by Tero Kristo and Keerthy.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/boot/dts/dra7.dtsi | 11 ++++++ arch/arm/mach-omap2/board-generic.c | 4 +-- arch/arm/mach-omap2/timer.c | 53 ++++++++++++++++++++++++++++- drivers/clk/ti/clk-7xx.c | 1 + include/linux/cpuhotplug.h | 1 + 5 files changed, 67 insertions(+), 3 deletions(-)
diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi index 0c0781a37c5a7..7f1fe4a724472 100644 --- a/arch/arm/boot/dts/dra7.dtsi +++ b/arch/arm/boot/dts/dra7.dtsi @@ -48,6 +48,7 @@
timer { compatible = "arm,armv7-timer"; + status = "disabled"; /* See ARM architected timer wrap erratum i940 */ interrupts = <GIC_PPI 13 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, <GIC_PPI 14 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, @@ -910,6 +911,8 @@ reg = <0x48032000 0x80>; interrupts = <GIC_SPI 33 IRQ_TYPE_LEVEL_HIGH>; ti,hwmods = "timer2"; + clock-names = "fck"; + clocks = <&l4per_clkctrl DRA7_TIMER2_CLKCTRL 24>; };
timer3: timer@48034000 { @@ -917,6 +920,10 @@ reg = <0x48034000 0x80>; interrupts = <GIC_SPI 34 IRQ_TYPE_LEVEL_HIGH>; ti,hwmods = "timer3"; + clock-names = "fck"; + clocks = <&l4per_clkctrl DRA7_TIMER3_CLKCTRL 24>; + assigned-clocks = <&l4per_clkctrl DRA7_TIMER3_CLKCTRL 24>; + assigned-clock-parents = <&timer_sys_clk_div>; };
timer4: timer@48036000 { @@ -924,6 +931,10 @@ reg = <0x48036000 0x80>; interrupts = <GIC_SPI 35 IRQ_TYPE_LEVEL_HIGH>; ti,hwmods = "timer4"; + clock-names = "fck"; + clocks = <&l4per_clkctrl DRA7_TIMER4_CLKCTRL 24>; + assigned-clocks = <&l4per_clkctrl DRA7_TIMER4_CLKCTRL 24>; + assigned-clock-parents = <&timer_sys_clk_div>; };
timer5: timer@48820000 { diff --git a/arch/arm/mach-omap2/board-generic.c b/arch/arm/mach-omap2/board-generic.c index 6b4f4975cf7a6..6e59c11131c48 100644 --- a/arch/arm/mach-omap2/board-generic.c +++ b/arch/arm/mach-omap2/board-generic.c @@ -330,7 +330,7 @@ DT_MACHINE_START(DRA74X_DT, "Generic DRA74X (Flattened Device Tree)") .init_late = dra7xx_init_late, .init_irq = omap_gic_of_init, .init_machine = omap_generic_init, - .init_time = omap5_realtime_timer_init, + .init_time = omap3_gptimer_timer_init, .dt_compat = dra74x_boards_compat, .restart = omap44xx_restart, MACHINE_END @@ -353,7 +353,7 @@ DT_MACHINE_START(DRA72X_DT, "Generic DRA72X (Flattened Device Tree)") .init_late = dra7xx_init_late, .init_irq = omap_gic_of_init, .init_machine = omap_generic_init, - .init_time = omap5_realtime_timer_init, + .init_time = omap3_gptimer_timer_init, .dt_compat = dra72x_boards_compat, .restart = omap44xx_restart, MACHINE_END diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c index f24858d5ac693..c4ba848e8af62 100644 --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -42,6 +42,7 @@ #include <linux/platform_device.h> #include <linux/platform_data/dmtimer-omap.h> #include <linux/sched_clock.h> +#include <linux/cpu.h>
#include <asm/mach/time.h> #include <asm/smp_twd.h> @@ -421,6 +422,53 @@ static void __init dmtimer_clkevt_init_common(struct dmtimer_clockevent *clkevt, timer->rate); }
+static DEFINE_PER_CPU(struct dmtimer_clockevent, dmtimer_percpu_timer); + +static int omap_gptimer_starting_cpu(unsigned int cpu) +{ + struct dmtimer_clockevent *clkevt = per_cpu_ptr(&dmtimer_percpu_timer, cpu); + struct clock_event_device *dev = &clkevt->dev; + struct omap_dm_timer *timer = &clkevt->timer; + + clockevents_config_and_register(dev, timer->rate, 3, ULONG_MAX); + irq_force_affinity(dev->irq, cpumask_of(cpu)); + + return 0; +} + +static int __init dmtimer_percpu_quirk_init(void) +{ + struct dmtimer_clockevent *clkevt; + struct clock_event_device *dev; + struct device_node *arm_timer; + struct omap_dm_timer *timer; + int cpu = 0; + + arm_timer = of_find_compatible_node(NULL, NULL, "arm,armv7-timer"); + if (of_device_is_available(arm_timer)) { + pr_warn_once("ARM architected timer wrap issue i940 detected\n"); + return 0; + } + + for_each_possible_cpu(cpu) { + clkevt = per_cpu_ptr(&dmtimer_percpu_timer, cpu); + dev = &clkevt->dev; + timer = &clkevt->timer; + + dmtimer_clkevt_init_common(clkevt, 0, "timer_sys_ck", + CLOCK_EVT_FEAT_ONESHOT, + cpumask_of(cpu), + "assigned-clock-parents", + 500, "percpu timer"); + } + + cpuhp_setup_state(CPUHP_AP_OMAP_DM_TIMER_STARTING, + "clockevents/omap/gptimer:starting", + omap_gptimer_starting_cpu, NULL); + + return 0; +} + /* Clocksource code */ static struct omap_dm_timer clksrc; static bool use_gptimer_clksrc __initdata; @@ -565,6 +613,9 @@ static void __init __omap_sync32k_timer_init(int clkev_nr, const char *clkev_src 3, /* Timer internal resynch latency */ 0xffffffff);
+ if (soc_is_dra7xx()) + dmtimer_percpu_quirk_init(); + /* Enable the use of clocksource="gp_timer" kernel parameter */ if (use_gptimer_clksrc || gptimer) omap2_gptimer_clocksource_init(clksrc_nr, clksrc_src, @@ -592,7 +643,7 @@ void __init omap3_secure_sync32k_timer_init(void) #endif /* CONFIG_ARCH_OMAP3 */
#if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM33XX) || \ - defined(CONFIG_SOC_AM43XX) + defined(CONFIG_SOC_AM43XX) || defined(CONFIG_SOC_DRA7XX) void __init omap3_gptimer_timer_init(void) { __omap_sync32k_timer_init(2, "timer_sys_ck", NULL, diff --git a/drivers/clk/ti/clk-7xx.c b/drivers/clk/ti/clk-7xx.c index 71a122b2dc67e..b6d1ec49fa01a 100644 --- a/drivers/clk/ti/clk-7xx.c +++ b/drivers/clk/ti/clk-7xx.c @@ -733,6 +733,7 @@ const struct omap_clkctrl_data dra7_clkctrl_data[] __initconst = { static struct ti_dt_clk dra7xx_clks[] = { DT_CLK(NULL, "timer_32k_ck", "sys_32k_ck"), DT_CLK(NULL, "sys_clkin_ck", "timer_sys_clk_div"), + DT_CLK(NULL, "timer_sys_ck", "timer_sys_clk_div"), DT_CLK(NULL, "sys_clkin", "sys_clkin1"), DT_CLK(NULL, "atl_dpll_clk_mux", "atl_cm:0000:24"), DT_CLK(NULL, "atl_gfclk_mux", "atl_cm:0000:26"), diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index f7767e58d1c59..18e20054bcff4 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -118,6 +118,7 @@ enum cpuhp_state { CPUHP_AP_ARM_L2X0_STARTING, CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, CPUHP_AP_ARM_ARCH_TIMER_STARTING, + CPUHP_AP_OMAP_DM_TIMER_STARTING, CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, CPUHP_AP_JCORE_TIMER_STARTING, CPUHP_AP_ARM_TWD_STARTING,
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
stable inclusion from linux-4.19.197 commit fcfbdfe9626edd5bf00c732e093eed249ecdbfa1
--------------------------------
Merge 33 patches from 4.19.197 stable branch (35 total) beside 1 already merged patches: cadf5bbcefbd9 KVM: SVM: Periodically schedule when unregistering regions on destroy
1 skipped: 6a6e04ce3bafb ext4: eliminate bogus error in ext4_data_block_valid_rcu()
Link: https://lore.kernel.org/r/20210709131644.969303901@linuxfoundation.org Tested-by: Jon Hunter jonathanh@nvidia.com Tested-by: Shuah Khan skhan@linuxfoundation.org Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk Tested-by: Linux Kernel Functional Testing lkft@linaro.org Tested-by: Guenter Roeck linux@roeck-us.net Tested-by: Pavel Machek (CIP) pavel@denx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile index b14e925663471..19e495f5b2e35 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 19 -SUBLEVEL = 196 +SUBLEVEL = 197 EXTRAVERSION = NAME = "People's Front"