Commit 0119f9d
committed
opal/free_list: fix race condition
There was a race condition in opal_free_list_get. Code throughout the
Open MPI codebase was assuming that a NULL return from this function
was due to an out-of-memory condition. In some cases this can lead to
a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for
example). Before this commit opal_free_list_get_mt looked like this:
```c
static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist)
{
opal_free_list_item_t *item =
(opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super);
if (OPAL_UNLIKELY(NULL == item)) {
opal_mutex_lock (&flist->fl_lock);
opal_free_list_grow_st (flist, flist->fl_num_per_alloc);
opal_mutex_unlock (&flist->fl_lock);
item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super);
}
return item;
}
```
The problem is in a multithreaded environment is *is* possible for the
free list to be grown successfully but the thread calling
opal_free_list_get_mt to be left without an item. The happens if
between the calls to opal_lifo_push_atomic in opal_free_list_grow_st
and the call to opal_lifo_pop_atomic other threads pop all the items
added to the free list.
This commit fixes the issue by ensuring the thread that successfully
grew the free list **always** gets a free list item.
Fixes #2921
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 5c770a7)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>1 parent e886f25 commit 0119f9d
2 files changed
+31
-17
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
163 | 163 | | |
164 | | - | |
| 164 | + | |
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
| |||
263 | 263 | | |
264 | 264 | | |
265 | 265 | | |
266 | | - | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
267 | 274 | | |
268 | 275 | | |
269 | | - | |
270 | 276 | | |
271 | 277 | | |
272 | 278 | | |
| |||
298 | 304 | | |
299 | 305 | | |
300 | 306 | | |
301 | | - | |
| 307 | + | |
302 | 308 | | |
303 | 309 | | |
304 | 310 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
| 149 | + | |
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
| |||
155 | 156 | | |
156 | 157 | | |
157 | 158 | | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
158 | 165 | | |
159 | | - | |
| 166 | + | |
160 | 167 | | |
161 | 168 | | |
162 | 169 | | |
| |||
195 | 202 | | |
196 | 203 | | |
197 | 204 | | |
198 | | - | |
| 205 | + | |
199 | 206 | | |
200 | | - | |
201 | 207 | | |
202 | 208 | | |
203 | 209 | | |
| |||
209 | 215 | | |
210 | 216 | | |
211 | 217 | | |
212 | | - | |
213 | | - | |
| 218 | + | |
214 | 219 | | |
215 | 220 | | |
216 | 221 | | |
| |||
253 | 258 | | |
254 | 259 | | |
255 | 260 | | |
256 | | - | |
| 261 | + | |
257 | 262 | | |
258 | 263 | | |
259 | 264 | | |
| |||
274 | 279 | | |
275 | 280 | | |
276 | 281 | | |
277 | | - | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
278 | 285 | | |
279 | 286 | | |
280 | 287 | | |
| |||
287 | 294 | | |
288 | 295 | | |
289 | 296 | | |
290 | | - | |
| 297 | + | |
291 | 298 | | |
292 | 299 | | |
293 | 300 | | |
294 | | - | |
295 | | - | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
296 | 304 | | |
297 | 305 | | |
298 | 306 | | |
| |||
0 commit comments