Commit 0b0f2b7
committed
md-cluster: fix hanging issue while a new disk adding
JIRA: https://issues.redhat.com/browse/RHEL-46615
commit fff42f2
Author: Heming Zhao <heming.zhao@suse.com>
Date: Tue Jul 9 18:41:19 2024 +0800
md-cluster: fix hanging issue while a new disk adding
The commit 1bbe254 ("md-cluster: check for timeout while a
new disk adding") is correct in terms of code syntax but not
suite real clustered code logic.
When a timeout occurs while adding a new disk, if recv_daemon()
bypasses the unlock for ack_lockres:CR, another node will be waiting
to grab EX lock. This will cause the cluster to hang indefinitely.
How to fix:
1. In dlm_lock_sync(), change the wait behaviour from forever to a
timeout, This could avoid the hanging issue when another node
fails to handle cluster msg. Another result of this change is
that if another node receives an unknown msg (e.g. a new msg_type),
the old code will hang, whereas the new code will timeout and fail.
This could help cluster_md handle new msg_type from different
nodes with different kernel/module versions (e.g. The user only
updates one leg's kernel and monitors the stability of the new
kernel).
2. The old code for __sendmsg() always returns 0 (success) under the
design (must successfully unlock ->message_lockres). This commit
makes this function return an error number when an error occurs.
Fixes: 1bbe254 ("md-cluster: check for timeout while a new disk adding")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Su Yue <glass.su@suse.com>
Acked-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240709104120.22243-1-heming.zhao@suse.com
(cherry picked from commit fff42f2)
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>1 parent 5178b1a commit 0b0f2b7
1 file changed
+12
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
133 | | - | |
| 134 | + | |
| 135 | + | |
134 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
135 | 141 | | |
136 | 142 | | |
137 | 143 | | |
| |||
743 | 749 | | |
744 | 750 | | |
745 | 751 | | |
746 | | - | |
| 752 | + | |
747 | 753 | | |
748 | 754 | | |
749 | 755 | | |
750 | 756 | | |
751 | 757 | | |
752 | 758 | | |
753 | 759 | | |
754 | | - | |
| 760 | + | |
755 | 761 | | |
756 | 762 | | |
757 | 763 | | |
| |||
781 | 787 | | |
782 | 788 | | |
783 | 789 | | |
784 | | - | |
785 | | - | |
| 790 | + | |
786 | 791 | | |
787 | | - | |
788 | | - | |
789 | | - | |
790 | | - | |
791 | | - | |
| 792 | + | |
| 793 | + | |
792 | 794 | | |
793 | 795 | | |
794 | 796 | | |
| |||
0 commit comments