Skip to content

Commit b7e2823

Browse files
apopple-nvidiaakpm00
authored andcommitted
mm/mm_init: move p2pdma page refcount initialisation to p2pdma
Currently ZONE_DEVICE page reference counts are initialised by core memory management code in __init_zone_device_page() as part of the memremap() call which driver modules make to obtain ZONE_DEVICE pages. This initialises page refcounts to 1 before returning them to the driver. This was presumably done because it drivers had a reference of sorts on the page. It also ensured the page could always be mapped with vm_insert_page() for example and would never get freed (ie. have a zero refcount), freeing drivers of manipulating page reference counts. However it complicates figuring out whether or not a page is free from the mm perspective because it is no longer possible to just look at the refcount. Instead the page type must be known and if GUP is used a secondary pgmap reference is also sometimes needed. To simplify this it is desirable to remove the page reference count for the driver, so core mm can just use the refcount without having to account for page type or do other types of tracking. This is possible because drivers can always assume the page is valid as core kernel will never offline or remove the struct page. This means it is now up to drivers to initialise the page refcount as required. P2PDMA uses vm_insert_page() to map the page, and that requires a non-zero reference count when initialising the page so set that when the page is first mapped. Link: https://lkml.kernel.org/r/6aedb0ac2886dcc4503cb705273db5b3863a0b66.1740713401.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Asahi Lina <lina@asahilina.net> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent a58c6fb commit b7e2823

File tree

3 files changed

+42
-10
lines changed

3 files changed

+42
-10
lines changed

drivers/pci/p2pdma.c

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,13 +140,22 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
140140
rcu_read_unlock();
141141

142142
for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
143-
ret = vm_insert_page(vma, vaddr, virt_to_page(kaddr));
143+
struct page *page = virt_to_page(kaddr);
144+
145+
/*
146+
* Initialise the refcount for the freshly allocated page. As
147+
* we have just allocated the page no one else should be
148+
* using it.
149+
*/
150+
VM_WARN_ON_ONCE_PAGE(!page_ref_count(page), page);
151+
set_page_count(page, 1);
152+
ret = vm_insert_page(vma, vaddr, page);
144153
if (ret) {
145154
gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
146155
return ret;
147156
}
148157
percpu_ref_get(ref);
149-
put_page(virt_to_page(kaddr));
158+
put_page(page);
150159
kaddr += PAGE_SIZE;
151160
len -= PAGE_SIZE;
152161
}

mm/memremap.c

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -488,15 +488,24 @@ void free_zone_device_folio(struct folio *folio)
488488
folio->mapping = NULL;
489489
folio->page.pgmap->ops->page_free(folio_page(folio, 0));
490490

491-
if (folio->page.pgmap->type != MEMORY_DEVICE_PRIVATE &&
492-
folio->page.pgmap->type != MEMORY_DEVICE_COHERENT)
491+
switch (folio->page.pgmap->type) {
492+
case MEMORY_DEVICE_PRIVATE:
493+
case MEMORY_DEVICE_COHERENT:
494+
put_dev_pagemap(folio->page.pgmap);
495+
break;
496+
497+
case MEMORY_DEVICE_FS_DAX:
498+
case MEMORY_DEVICE_GENERIC:
493499
/*
494500
* Reset the refcount to 1 to prepare for handing out the page
495501
* again.
496502
*/
497503
folio_set_count(folio, 1);
498-
else
499-
put_dev_pagemap(folio->page.pgmap);
504+
break;
505+
506+
case MEMORY_DEVICE_PCI_P2PDMA:
507+
break;
508+
}
500509
}
501510

502511
void zone_device_page_init(struct page *page)

mm/mm_init.c

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1026,12 +1026,26 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
10261026
}
10271027

10281028
/*
1029-
* ZONE_DEVICE pages are released directly to the driver page allocator
1030-
* which will set the page count to 1 when allocating the page.
1029+
* ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC and
1030+
* MEMORY_TYPE_FS_DAX pages are released directly to the driver page
1031+
* allocator which will set the page count to 1 when allocating the
1032+
* page.
1033+
*
1034+
* MEMORY_TYPE_GENERIC and MEMORY_TYPE_FS_DAX pages automatically have
1035+
* their refcount reset to one whenever they are freed (ie. after
1036+
* their refcount drops to 0).
10311037
*/
1032-
if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
1033-
pgmap->type == MEMORY_DEVICE_COHERENT)
1038+
switch (pgmap->type) {
1039+
case MEMORY_DEVICE_PRIVATE:
1040+
case MEMORY_DEVICE_COHERENT:
1041+
case MEMORY_DEVICE_PCI_P2PDMA:
10341042
set_page_count(page, 0);
1043+
break;
1044+
1045+
case MEMORY_DEVICE_FS_DAX:
1046+
case MEMORY_DEVICE_GENERIC:
1047+
break;
1048+
}
10351049
}
10361050

10371051
/*

0 commit comments

Comments
 (0)