Commit 4d428dc
netfs: fix reference leak
Commit 20d72b0 ("netfs: Fix the request's work item to not
require a ref") modified netfs_alloc_request() to initialize the
reference counter to 2 instead of 1. The rationale was that the
requet's "work" would release the second reference after completion
(via netfs_{read,write}_collection_worker()). That works most of the
time if all goes well.
However, it leaks this additional reference if the request is released
before the I/O operation has been submitted: the error code path only
decrements the reference counter once and the work item will never be
queued because there will never be a completion.
This has caused outages of our whole server cluster today because
tasks were blocked in netfs_wait_for_outstanding_io(), leading to
deadlocks in Ceph (another bug that I will address soon in another
patch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call
which failed in fscache_begin_write_operation(). The leaked
netfs_io_request was never completed, leaving `netfs_inode.io_count`
with a positive value forever.
All of this is super-fragile code. Finding out which code paths will
lead to an eventual completion and which do not is hard to see:
- Some functions like netfs_create_write_req() allocate a request, but
will never submit any I/O.
- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()
and then netfs_put_request(); however, netfs_unbuffered_read() can
also fail early before submitting the I/O request, therefore another
netfs_put_request() call must be added there.
A rule of thumb is that functions that return a `netfs_io_request` do
not submit I/O, and all of their callers must be checked.
For my taste, the whole netfs code needs an overhaul to make reference
counting easier to understand and less fragile & obscure. But to fix
this bug here and now and produce a patch that is adequate for a
stable backport, I tried a minimal approach that quickly frees the
request object upon early failure.
I decided against adding a second netfs_put_request() each time
because that would cause code duplication which obscures the code
further. Instead, I added the function netfs_put_failed_request()
which frees such a failed request synchronously under the assumption
that the reference count is exactly 2 (as initially set by
netfs_alloc_request() and never touched), verified by a
WARN_ON_ONCE(). It then deinitializes the request object (without
going through the "cleanup_work" indirection) and frees the allocation
(with RCU protection to protect against concurrent access by
netfs_requests_seq_start()).
All code paths that fail early have been changed to call
netfs_put_failed_request() instead of netfs_put_request().
Additionally, I have added a netfs_put_request() call to
netfs_unbuffered_read() as explained above because the
netfs_put_failed_request() approach does not work there.
Fixes: 20d72b0 ("netfs: Fix the request's work item to not require a ref")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>1 parent 9158c6b commit 4d428dc
File tree
8 files changed
+47
-14
lines changed- fs/netfs
8 files changed
+47
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
372 | | - | |
| 372 | + | |
373 | 373 | | |
374 | 374 | | |
375 | 375 | | |
| |||
472 | 472 | | |
473 | 473 | | |
474 | 474 | | |
475 | | - | |
| 475 | + | |
476 | 476 | | |
477 | 477 | | |
478 | 478 | | |
| |||
532 | 532 | | |
533 | 533 | | |
534 | 534 | | |
535 | | - | |
| 535 | + | |
536 | 536 | | |
537 | 537 | | |
538 | 538 | | |
| |||
699 | 699 | | |
700 | 700 | | |
701 | 701 | | |
702 | | - | |
| 702 | + | |
703 | 703 | | |
704 | 704 | | |
705 | 705 | | |
| |||
754 | 754 | | |
755 | 755 | | |
756 | 756 | | |
757 | | - | |
| 757 | + | |
758 | 758 | | |
759 | 759 | | |
760 | 760 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
| 134 | + | |
134 | 135 | | |
135 | 136 | | |
136 | 137 | | |
| |||
205 | 206 | | |
206 | 207 | | |
207 | 208 | | |
208 | | - | |
| 209 | + | |
209 | 210 | | |
210 | 211 | | |
211 | 212 | | |
| |||
238 | 239 | | |
239 | 240 | | |
240 | 241 | | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
241 | 246 | | |
242 | 247 | | |
243 | 248 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
104 | 108 | | |
105 | 109 | | |
106 | 110 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
90 | 91 | | |
91 | 92 | | |
92 | 93 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
119 | | - | |
| 119 | + | |
120 | 120 | | |
121 | | - | |
122 | | - | |
123 | 121 | | |
124 | 122 | | |
125 | 123 | | |
| |||
149 | 147 | | |
150 | 148 | | |
151 | 149 | | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
152 | 158 | | |
153 | 159 | | |
154 | 160 | | |
| |||
167 | 173 | | |
168 | 174 | | |
169 | 175 | | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
170 | 194 | | |
171 | 195 | | |
172 | 196 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
121 | | - | |
| 121 | + | |
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
189 | 189 | | |
190 | 190 | | |
191 | 191 | | |
192 | | - | |
| 192 | + | |
193 | 193 | | |
194 | 194 | | |
195 | 195 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
137 | | - | |
| 136 | + | |
138 | 137 | | |
139 | 138 | | |
140 | 139 | | |
| |||
0 commit comments