Commit be17c0d
riscv: module: Optimize PLT/GOT entry counting
perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
inside module_frob_arch_sections(). This is because amdgpu.ko contains
about 300000 relocations in its .rela.text section, and the algorithm in
count_max_entries() takes quadratic time.
Apply two optimizations from the arm64 code, which together reduce the
total execution time by 99.58%. First, sort the relocations so duplicate
entries are adjacent. Second, reduce the number of relocations that must
be sorted by filtering to only relocations that need PLT/GOT entries, as
done in commit d4e0340 ("arm64/module: Optimize module load time by
optimizing PLT counting").
Unlike the arm64 code, here the filtering and sorting is done in a
scratch buffer, because the HI20 relocation search optimization in
apply_relocate_add() depends on the original order of the relocations.
This allows accumulating PLT/GOT relocations across sections so sorting
and counting is only done once per module.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>1 parent 881dadf commit be17c0d
1 file changed
+65
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
58 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
59 | 62 | | |
60 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
61 | 71 | | |
62 | 72 | | |
63 | 73 | | |
64 | 74 | | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
71 | 80 | | |
72 | 81 | | |
73 | | - | |
| 82 | + | |
74 | 83 | | |
75 | 84 | | |
76 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
77 | 89 | | |
78 | 90 | | |
79 | 91 | | |
80 | | - | |
81 | | - | |
| 92 | + | |
82 | 93 | | |
83 | 94 | | |
84 | | - | |
85 | | - | |
| 95 | + | |
86 | 96 | | |
| 97 | + | |
| 98 | + | |
87 | 99 | | |
88 | 100 | | |
89 | 101 | | |
90 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
91 | 115 | | |
92 | 116 | | |
93 | 117 | | |
| 118 | + | |
94 | 119 | | |
95 | 120 | | |
| 121 | + | |
| 122 | + | |
96 | 123 | | |
97 | 124 | | |
98 | 125 | | |
| |||
122 | 149 | | |
123 | 150 | | |
124 | 151 | | |
| 152 | + | |
125 | 153 | | |
126 | | - | |
127 | 154 | | |
| 155 | + | |
128 | 156 | | |
129 | 157 | | |
130 | 158 | | |
| |||
133 | 161 | | |
134 | 162 | | |
135 | 163 | | |
136 | | - | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
137 | 186 | | |
138 | 187 | | |
139 | 188 | | |
| |||
0 commit comments