Skip to content

Commit 103082f

Browse files
Lijo LazarSasha Levin
authored andcommitted
drm/amdgpu: Report individual reset error
[ Upstream commit 2e97663 ] If reinitialization of one of the GPUs fails after reset, it logs failure on all subsequent GPUs eventhough they have resumed successfully. A sample log where only device at 0000:95:00.0 had a failure - amdgpu 0000:15:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:65:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:75:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:85:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:95:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:e5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:f5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:05:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:15:00.0: amdgpu: GPU reset end with ret = -5 To avoid confusion, report the error for each device separately and return the first error as the overall result. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent e7c9bc7 commit 103082f

File tree

1 file changed

+15
-10
lines changed

1 file changed

+15
-10
lines changed

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6337,23 +6337,28 @@ static int amdgpu_device_sched_resume(struct list_head *device_list,
63376337
if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled)
63386338
drm_helper_resume_force_mode(adev_to_drm(tmp_adev));
63396339

6340-
if (tmp_adev->asic_reset_res)
6341-
r = tmp_adev->asic_reset_res;
6342-
6343-
tmp_adev->asic_reset_res = 0;
6344-
6345-
if (r) {
6340+
if (tmp_adev->asic_reset_res) {
63466341
/* bad news, how to tell it to userspace ?
63476342
* for ras error, we should report GPU bad status instead of
63486343
* reset failure
63496344
*/
63506345
if (reset_context->src != AMDGPU_RESET_SRC_RAS ||
63516346
!amdgpu_ras_eeprom_check_err_threshold(tmp_adev))
6352-
dev_info(tmp_adev->dev, "GPU reset(%d) failed\n",
6353-
atomic_read(&tmp_adev->gpu_reset_counter));
6354-
amdgpu_vf_error_put(tmp_adev, AMDGIM_ERROR_VF_GPU_RESET_FAIL, 0, r);
6347+
dev_info(
6348+
tmp_adev->dev,
6349+
"GPU reset(%d) failed with error %d \n",
6350+
atomic_read(
6351+
&tmp_adev->gpu_reset_counter),
6352+
tmp_adev->asic_reset_res);
6353+
amdgpu_vf_error_put(tmp_adev,
6354+
AMDGIM_ERROR_VF_GPU_RESET_FAIL, 0,
6355+
tmp_adev->asic_reset_res);
6356+
if (!r)
6357+
r = tmp_adev->asic_reset_res;
6358+
tmp_adev->asic_reset_res = 0;
63556359
} else {
6356-
dev_info(tmp_adev->dev, "GPU reset(%d) succeeded!\n", atomic_read(&tmp_adev->gpu_reset_counter));
6360+
dev_info(tmp_adev->dev, "GPU reset(%d) succeeded!\n",
6361+
atomic_read(&tmp_adev->gpu_reset_counter));
63576362
if (amdgpu_acpi_smart_shift_update(tmp_adev,
63586363
AMDGPU_SS_DEV_D0))
63596364
dev_warn(tmp_adev->dev,

0 commit comments

Comments
 (0)