Skip to content

Commit c831e8c

Browse files
committed
Merge: s390/ism: intermittent corrupted control commands
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7268 JIRA: https://issues.redhat.com/browse/RHEL-110206 commit 897e860 Author: Halil Pasic <pasic@linux.ibm.com> Date: Tue Jul 22 18:18:17 2025 +0200 s390/ism: fix concurrency management in ism_cmd() The s390x ISM device data sheet clearly states that only one request-response sequence is allowable per ISM function at any point in time. Unfortunately as of today the s390/ism driver in Linux does not honor that requirement. This patch aims to rectify that. This problem was discovered based on Aliaksei's bug report which states that for certain workloads the ISM functions end up entering error state (with PEC 2 as seen from the logs) after a while and as a consequence connections handled by the respective function break, and for future connection requests the ISM device is not considered -- given it is in a dysfunctional state. During further debugging PEC 3A was observed as well. A kernel message like [ 1211.244319] zpci: 061a:00:00.0: Event 0x2 reports an error for PCI function 0x61a is a reliable indicator of the stated function entering error state with PEC 2. Let me also point out that a kernel message like [ 1211.244325] zpci: 061a:00:00.0: The ism driver bound to the device does not support error recovery is a reliable indicator that the ISM function won't be auto-recovered because the ISM driver currently lacks support for it. On a technical level, without this synchronization, commands (inputs to the FW) may be partially or fully overwritten (corrupted) by another CPU trying to issue commands on the same function. There is hard evidence that this can lead to DMB token values being used as DMB IOVAs, leading to PEC 2 PCI events indicating invalid DMA. But this is only one of the failure modes imaginable. In theory even completely losing one command and executing another one twice and then trying to interpret the outputs as if the command we intended to execute was actually executed and not the other one is also possible. Frankly, I don't feel confident about providing an exhaustive list of possible consequences. Fixes: 684b89b ("s390/ism: add device driver for internal shared memory") Reported-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com> Tested-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Tested-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250722161817.1298473-1-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mete Durlu <mdurlu@redhat.com> Approved-by: Steve Best <sbest@redhat.com> Approved-by: Tony Camuso <tcamuso@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Patrick Talbert <ptalbert@redhat.com>
2 parents df62722 + 017b3ad commit c831e8c

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

drivers/s390/net/ism_drv.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
131131
struct ism_req_hdr *req = cmd;
132132
struct ism_resp_hdr *resp = cmd;
133133

134+
spin_lock(&ism->cmd_lock);
134135
__ism_write_cmd(ism, req + 1, sizeof(*req), req->len - sizeof(*req));
135136
__ism_write_cmd(ism, req, 0, sizeof(*req));
136137

@@ -144,6 +145,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
144145
}
145146
__ism_read_cmd(ism, resp + 1, sizeof(*resp), resp->len - sizeof(*resp));
146147
out:
148+
spin_unlock(&ism->cmd_lock);
147149
return resp->ret;
148150
}
149151

@@ -607,6 +609,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
607609
return -ENOMEM;
608610

609611
spin_lock_init(&ism->lock);
612+
spin_lock_init(&ism->cmd_lock);
610613
dev_set_drvdata(&pdev->dev, ism);
611614
ism->pdev = pdev;
612615
ism->dev.parent = &pdev->dev;

include/linux/ism.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ struct ism_dmb {
2828

2929
struct ism_dev {
3030
spinlock_t lock; /* protects the ism device */
31+
spinlock_t cmd_lock; /* serializes cmds */
3132
struct list_head list;
3233
struct pci_dev *pdev;
3334

0 commit comments

Comments
 (0)