From: Li Xinhai lixinhai.lxh@gmail.com
mainline inclusion from mainline-v5.5-rc1 commit a18b3ac25bb7be4781cb9e6d31f3e57b3ba01b06 category: bugfix bugzilla: 97909 CVE: NA
-------------------------------------------------
Patch series "mm: Fix checking unmapped holes for mbind", v4.
This patchset fix checking unmapped holes for mbind().
First patch makes sure the vma been correctly tracked in .test_walk(), so each time when .test_walk() is called, the neighborhood of two vma is correct.
Current problem is that the !vma_migratable() check could cause return immediately without update tracking to vma.
Second patch fix the inconsistent report of EFAULT when mbind() is called for MPOL_DEFAULT and non MPOL_DEFAULT cases, so application do not need to have workaround code to handle this special behavior. Currently there are two problems, one is that the .test_walk() can not know there is hole at tail side of range, because .test_walk() only call for vma not for hole. The other one is that mbind_range() checks for hole at head side of range but do not consider the MPOL_MF_DISCONTIG_OK flag as done in .test_walk().
This patch (of 2):
Checking unmapped hole and updating the previous vma must be handled first, otherwise the unmapped hole could be calculated from a wrong previous vma.
Several commits were relevant to this error:
- commit 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
This commit was correct, the VM_PFNMAP check was after updating previous vma
- commit 48684a65b4e3 ("mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)")
This commit added VM_PFNMAP check before updating previous vma. Then, there were two VM_PFNMAP check did same thing twice.
- commit acda0c334028 ("mm/mempolicy.c: get rid of duplicated check for vma(VM_PFNMAP) in queue_page s_range()")
This commit tried to fix the duplicated VM_PFNMAP check, but it wrongly removed the one which was after updating vma.
Link: http://lkml.kernel.org/r/1573218104-11021-2-git-send-email-lixinhai.lxh@gmai... Fixes: acda0c334028 (mm/mempolicy.c: get rid of duplicated check for vma(VM_PFNMAP) in queue_pages_range()) Signed-off-by: Li Xinhai lixinhai.lxh@gmail.com Reviewed-by: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Michal Hocko mhocko@suse.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Hugh Dickins hughd@google.com Cc: linux-man linux-man@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/mempolicy.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3a835f96c8fea..7b4ba2f355911 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -657,6 +657,16 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, unsigned long endvma = vma->vm_end; unsigned long flags = qp->flags;
+ /* range check first */ + if (!(flags & MPOL_MF_DISCONTIG_OK)) { + if (!vma->vm_next && vma->vm_end < end) + return -EFAULT; + if (qp->prev && qp->prev->vm_end < vma->vm_start) + return -EFAULT; + } + + qp->prev = vma; + /* * Need check MPOL_MF_STRICT to return -EIO if possible * regardless of vma_migratable @@ -670,15 +680,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, if (vma->vm_start > start) start = vma->vm_start;
- if (!(flags & MPOL_MF_DISCONTIG_OK)) { - if (!vma->vm_next && vma->vm_end < end) - return -EFAULT; - if (qp->prev && qp->prev->vm_end < vma->vm_start) - return -EFAULT; - } - - qp->prev = vma; - if (flags & MPOL_MF_LAZY) { /* Similar to task_numa_work, skip inaccessible VMAs */ if (!is_vm_hugetlb_page(vma) &&
From: Li Xinhai lixinhai.lxh@gmail.com
mainline inclusion from mainline-v5.5-rc1 commit f18da660c095e3fff1690ea3d752f7b7188b35fb category: bugfix bugzilla: 97910 CVE: NA
-------------------------------------------------
mbind() is required to report EFAULT if range, specified by addr and len, contains unmapped holes. In current implementation, below rules are applied for this checking:
1: Unmapped holes at any part of the specified range should be reported as EFAULT if mbind() for none MPOL_DEFAULT cases;
2: Unmapped holes at any part of the specified range should be ignored (do not reprot EFAULT) if mbind() for MPOL_DEFAULT case;
3: The whole range in an unmapped hole should be reported as EFAULT;
Note that rule 2 does not fullfill the mbind() API definition, but since that behavior has existed for long days (the internal flag MPOL_MF_DISCONTIG_OK is for this purpose), this patch does not plan to change it.
In current code, application observed inconsistent behavior on rule 1 and rule 2 respectively. That inconsistency is fixed as below details.
Cases of rule 1:
- Hole at head side of range. Current code reprot EFAULT, no change by this patch.
[ vma ][ hole ][ vma ] [ range ]
- Hole at middle of range. Current code report EFAULT, no change by this patch.
[ vma ][ hole ][ vma ] [ range ]
- Hole at tail side of range. Current code do not report EFAULT, this patch fixes it.
[ vma ][ hole ][ vma ] [ range ]
Cases of rule 2:
- Hole at head side of range. Current code reports EFAULT, this patch fixes it.
[ vma ][ hole ][ vma ] [ range ]
- Hole at middle of range. Current code does not report EFAULT, no change by this patch.
[ vma ][ hole ][ vma] [ range ]
- Hole at tail side of range. Current code does not report EFAULT, no change by this patch.
[ vma ][ hole ][ vma] [ range ]
This patch has no changes to rule 3.
The unmapped hole checking can also be handled by using .pte_hole(), instead of .test_walk(). But .pte_hole() is called for holes inside and outside vma, which causes more cost, so this patch keeps the original design with .test_walk().
Link: http://lkml.kernel.org/r/1573218104-11021-3-git-send-email-lixinhai.lxh@gmai... Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") Signed-off-by: Li Xinhai lixinhai.lxh@gmail.com Reviewed-by: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Michal Hocko mhocko@suse.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Hugh Dickins hughd@google.com Cc: linux-man linux-man@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/mempolicy.c | 40 +++++++++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 13 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 7b4ba2f355911..d096ee1bcbf40 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -449,7 +449,9 @@ struct queue_pages { struct list_head *pagelist; unsigned long flags; nodemask_t *nmask; - struct vm_area_struct *prev; + unsigned long start; + unsigned long end; + struct vm_area_struct *first; };
/* @@ -658,14 +660,20 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, unsigned long flags = qp->flags;
/* range check first */ - if (!(flags & MPOL_MF_DISCONTIG_OK)) { - if (!vma->vm_next && vma->vm_end < end) - return -EFAULT; - if (qp->prev && qp->prev->vm_end < vma->vm_start) + VM_BUG_ON((vma->vm_start > start) || (vma->vm_end < end)); + + if (!qp->first) { + qp->first = vma; + if (!(flags & MPOL_MF_DISCONTIG_OK) && + (qp->start < vma->vm_start)) + /* hole at head side of range */ return -EFAULT; } - - qp->prev = vma; + if (!(flags & MPOL_MF_DISCONTIG_OK) && + ((vma->vm_end < qp->end) && + (!vma->vm_next || vma->vm_end < vma->vm_next->vm_start))) + /* hole at middle or tail of range */ + return -EFAULT;
/* * Need check MPOL_MF_STRICT to return -EIO if possible @@ -677,8 +685,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
if (endvma > end) endvma = end; - if (vma->vm_start > start) - start = vma->vm_start;
if (flags & MPOL_MF_LAZY) { /* Similar to task_numa_work, skip inaccessible VMAs */ @@ -715,11 +721,14 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, nodemask_t *nodes, unsigned long flags, struct list_head *pagelist) { + int err; struct queue_pages qp = { .pagelist = pagelist, .flags = flags, .nmask = nodes, - .prev = NULL, + .start = start, + .end = end, + .first = NULL, }; struct mm_walk queue_pages_walk = { .hugetlb_entry = queue_pages_hugetlb, @@ -729,7 +738,13 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, .private = &qp, };
- return walk_page_range(start, end, &queue_pages_walk); + err = walk_page_range(start, end, &queue_pages_walk); + + if (!qp.first) + /* whole range in hole */ + err = -EFAULT; + + return err; }
/* @@ -781,8 +796,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start, unsigned long vmend;
vma = find_vma(mm, start); - if (!vma || vma->vm_start > start) - return -EFAULT; + VM_BUG_ON(!vma);
prev = vma->vm_prev; if (start > vma->vm_start)