Skip to content

Commit 52396ac

Browse files
committed
Merge: pmem: update to v6.9
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4823 ## Summary of Changes Sync to upstream version 6.9. This includes various bug fixes as well as support for memory_hotplug.memmap_on_memory. ## Approved Development Ticket JIRA: https://issues.redhat.com/browse/RHEL-23824 Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Ewan D. Milne <emilne@redhat.com> Approved-by: Lenny Szubowicz <lszubowi@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>
2 parents a9de54b + df80b36 commit 52396ac

File tree

50 files changed

+952
-395
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+952
-395
lines changed
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
What: /sys/bus/dax/devices/daxX.Y/align
2+
Date: October, 2020
3+
KernelVersion: v5.10
4+
Contact: nvdimm@lists.linux.dev
5+
Description:
6+
(RW) Provides a way to specify an alignment for a dax device.
7+
Values allowed are constrained by the physical address ranges
8+
that back the dax device, and also by arch requirements.
9+
10+
What: /sys/bus/dax/devices/daxX.Y/mapping
11+
Date: October, 2020
12+
KernelVersion: v5.10
13+
Contact: nvdimm@lists.linux.dev
14+
Description:
15+
(WO) Provides a way to allocate a mapping range under a dax
16+
device. Specified in the format <start>-<end>.
17+
18+
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
19+
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
20+
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset
21+
Date: October, 2020
22+
KernelVersion: v5.10
23+
Contact: nvdimm@lists.linux.dev
24+
Description:
25+
(RO) A dax device may have multiple constituent discontiguous
26+
address ranges. These are represented by the different
27+
'mappingX' subdirectories. The 'start' attribute indicates the
28+
start physical address for the given range. The 'end' attribute
29+
indicates the end physical address for the given range. The
30+
'page_offset' attribute indicates the offset of the current
31+
range in the dax device.
32+
33+
What: /sys/bus/dax/devices/daxX.Y/resource
34+
Date: June, 2019
35+
KernelVersion: v5.3
36+
Contact: nvdimm@lists.linux.dev
37+
Description:
38+
(RO) The resource attribute indicates the starting physical
39+
address of a dax device. In case of a device with multiple
40+
constituent ranges, it indicates the starting address of the
41+
first range.
42+
43+
What: /sys/bus/dax/devices/daxX.Y/size
44+
Date: October, 2020
45+
KernelVersion: v5.10
46+
Contact: nvdimm@lists.linux.dev
47+
Description:
48+
(RW) The size attribute indicates the total size of a dax
49+
device. For creating subdivided dax devices, or for resizing
50+
an existing device, the new size can be written to this as
51+
part of the reconfiguration process.
52+
53+
What: /sys/bus/dax/devices/daxX.Y/numa_node
54+
Date: November, 2019
55+
KernelVersion: v5.5
56+
Contact: nvdimm@lists.linux.dev
57+
Description:
58+
(RO) If NUMA is enabled and the platform has affinitized the
59+
backing device for this dax device, emit the CPU node
60+
affinity for this device.
61+
62+
What: /sys/bus/dax/devices/daxX.Y/target_node
63+
Date: February, 2019
64+
KernelVersion: v5.1
65+
Contact: nvdimm@lists.linux.dev
66+
Description:
67+
(RO) The target-node attribute is the Linux numa-node that a
68+
device-dax instance may create when it is online. Prior to
69+
being online the device's 'numa_node' property reflects the
70+
closest online cpu node which is the typical expectation of a
71+
device 'numa_node'. Once it is online it becomes its own
72+
distinct numa node.
73+
74+
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/available_size
75+
Date: October, 2020
76+
KernelVersion: v5.10
77+
Contact: nvdimm@lists.linux.dev
78+
Description:
79+
(RO) The available_size attribute tracks available dax region
80+
capacity. This only applies to volatile hmem devices, not pmem
81+
devices, since pmem devices are defined by nvdimm namespace
82+
boundaries.
83+
84+
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/size
85+
Date: July, 2017
86+
KernelVersion: v5.1
87+
Contact: nvdimm@lists.linux.dev
88+
Description:
89+
(RO) The size attribute indicates the size of a given dax region
90+
in bytes.
91+
92+
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/align
93+
Date: October, 2020
94+
KernelVersion: v5.10
95+
Contact: nvdimm@lists.linux.dev
96+
Description:
97+
(RO) The align attribute indicates alignment of the dax region.
98+
Changes on align may not always be valid, when say certain
99+
mappings were created with 2M and then we switch to 1G. This
100+
validates all ranges against the new value being attempted, post
101+
resizing.
102+
103+
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/seed
104+
Date: October, 2020
105+
KernelVersion: v5.10
106+
Contact: nvdimm@lists.linux.dev
107+
Description:
108+
(RO) The seed device is a concept for dynamic dax regions to be
109+
able to split the region amongst multiple sub-instances. The
110+
seed device, similar to libnvdimm seed devices, is a device
111+
that starts with zero capacity allocated and unbound to a
112+
driver.
113+
114+
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/create
115+
Date: October, 2020
116+
KernelVersion: v5.10
117+
Contact: nvdimm@lists.linux.dev
118+
Description:
119+
(RW) The create interface to the dax region provides a way to
120+
create a new unconfigured dax device under the given region, which
121+
can then be configured (with a size etc.) and then probed.
122+
123+
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/delete
124+
Date: October, 2020
125+
KernelVersion: v5.10
126+
Contact: nvdimm@lists.linux.dev
127+
Description:
128+
(WO) The delete interface for a dax region provides for deletion
129+
of any 0-sized and idle dax devices.
130+
131+
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/id
132+
Date: July, 2017
133+
KernelVersion: v5.1
134+
Contact: nvdimm@lists.linux.dev
135+
Description:
136+
(RO) The id attribute indicates the region id of a dax region.
137+
138+
What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
139+
Date: January, 2024
140+
KernelVersion: v6.8
141+
Contact: nvdimm@lists.linux.dev
142+
Description:
143+
(RW) Control the memmap_on_memory setting if the dax device
144+
were to be hotplugged as system memory. This determines whether
145+
the 'altmap' for the hotplugged memory will be placed on the
146+
device being hotplugged (memmap_on_memory=1) or if it will be
147+
placed on regular memory (memmap_on_memory=0). This attribute
148+
must be set before the device is handed over to the 'kmem'
149+
driver (i.e. hotplugged into system-ram). Additionally, this
150+
depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled
151+
memmap_on_memory parameter for memory_hotplug. This is
152+
typically set on the kernel command line -
153+
memory_hotplug.memmap_on_memory set to 'true' or 'force'."

Documentation/admin-guide/mm/memory-hotplug.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,18 @@ The following module parameters are currently defined:
443443
memory in a way that huge pages in bigger
444444
granularity cannot be formed on hotplugged
445445
memory.
446+
447+
With value "force" it could result in memory
448+
wastage due to memmap size limitations. For
449+
example, if the memmap for a memory block
450+
requires 1 MiB, but the pageblock size is 2
451+
MiB, 1 MiB of hotplugged memory will be wasted.
452+
Note that there are still cases where the
453+
feature cannot be enforced: for example, if the
454+
memmap is smaller than a single page, or if the
455+
architecture does not support the forced mode
456+
in all configurations.
457+
446458
``online_policy`` read-write: Set the basic policy used for
447459
automatic zone selection when onlining memory
448460
blocks without specifying a target zone.

arch/arm64/Kconfig

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ config ARM64
7878
select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
7979
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
8080
select ARCH_KEEP_MEMBLOCK
81+
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
8182
select ARCH_USE_CMPXCHG_LOCKREF
8283
select ARCH_USE_GNU_PROPERTY
8384
select ARCH_USE_MEMTEST
@@ -323,9 +324,6 @@ config GENERIC_CSUM
323324
config GENERIC_CALIBRATE_DELAY
324325
def_bool y
325326

326-
config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
327-
def_bool y
328-
329327
config SMP
330328
def_bool y
331329

arch/powerpc/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@ config PPC
137137
select ARCH_HAS_UBSAN_SANITIZE_ALL
138138
select ARCH_HAVE_NMI_SAFE_CMPXCHG
139139
select ARCH_KEEP_MEMBLOCK
140+
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU
140141
select ARCH_MIGHT_HAVE_PC_PARPORT
141142
select ARCH_MIGHT_HAVE_PC_SERIO
142143
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX

arch/powerpc/include/asm/pgtable.h

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,28 @@ static inline bool is_ioremap_addr(const void *x)
146146

147147
return addr >= IOREMAP_BASE && addr < IOREMAP_END;
148148
}
149+
150+
/*
151+
* mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
152+
* some of the restrictions. We don't check for PMD_SIZE because our
153+
* vmemmap allocation code can fallback correctly. The pageblock
154+
* alignment requirement is met using altmap->reserve blocks.
155+
*/
156+
#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
157+
static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
158+
{
159+
if (!radix_enabled())
160+
return false;
161+
/*
162+
* With 4K page size and 2M PMD_SIZE, we can align
163+
* things better with memory block size value
164+
* starting from 128MB. Hence align things with PMD_SIZE.
165+
*/
166+
if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
167+
return IS_ALIGNED(vmemmap_size, PMD_SIZE);
168+
return true;
169+
}
170+
149171
#endif /* CONFIG_PPC64 */
150172

151173
#endif /* __ASSEMBLY__ */

arch/powerpc/platforms/pseries/hotplug-memory.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -652,7 +652,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
652652
nid = first_online_node;
653653

654654
/* Add the memory */
655-
rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
655+
rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_MEMMAP_ON_MEMORY);
656656
if (rc) {
657657
invalidate_lmb_associativity_index(lmb);
658658
return rc;

arch/x86/Kconfig

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ config X86
103103
select ARCH_HAS_DEBUG_WX
104104
select ARCH_HAS_ZONE_DMA_SET if EXPERT
105105
select ARCH_HAVE_NMI_SAFE_CMPXCHG
106+
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
106107
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
107108
select ARCH_MIGHT_HAVE_PC_PARPORT
108109
select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2670,9 +2671,6 @@ config ARCH_HAS_ADD_PAGES
26702671
def_bool y
26712672
depends on ARCH_ENABLE_MEMORY_HOTPLUG
26722673

2673-
config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
2674-
def_bool y
2675-
26762674
menu "Power management and ACPI options"
26772675

26782676
config ARCH_HIBERNATION_HEADER

0 commit comments

Comments
 (0)