Commit c7275ce
nvme: improve handling of long keep alives
Upon keep alive completion, nvme_keep_alive_work is scheduled with the
same delay every time. If keep alive commands are completing slowly,
this may cause a keep alive timeout. The following trace illustrates the
issue, taking KATO = 8 and TBKAS off for simplicity:
1. t = 0: run nvme_keep_alive_work, send keep alive
2. t = ε: keep alive reaches controller, controller restarts its keep
alive timer
3. t = 4: host receives keep alive completion, schedules
nvme_keep_alive_work with delay 4
4. t = 8: run nvme_keep_alive_work, send keep alive
Here, a keep alive having RTT of 4 causes a delay of at least 8 - ε
between the controller receiving successive keep alives. With ε small,
the controller is likely to detect a keep alive timeout.
Fix this by calculating the RTT of the keep alive command, and adjusting
the scheduling delay of the next keep alive work accordingly.
Reported-by: Costa Sapuntzakis <costa@purestorage.com>
Reported-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>1 parent 774a963 commit c7275ce
1 file changed
+15
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1199 | 1199 | | |
1200 | 1200 | | |
1201 | 1201 | | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
1202 | 1216 | | |
1203 | 1217 | | |
1204 | 1218 | | |
| |||
1217 | 1231 | | |
1218 | 1232 | | |
1219 | 1233 | | |
1220 | | - | |
| 1234 | + | |
1221 | 1235 | | |
1222 | 1236 | | |
1223 | 1237 | | |
| |||
0 commit comments