Skip to content

Conversation

@ekouts
Copy link
Contributor

@ekouts ekouts commented Oct 16, 2025

Add MAINTENANCE to the available states for nodes that belong in a reservation with the MAINT flag, when the user explicitly requests for this reservation.

Fixes #3572 .

@ekouts ekouts requested review from teojgo and vkarak October 16, 2025 11:36
@ekouts ekouts self-assigned this Oct 16, 2025
@ekouts ekouts changed the title Improve slurm handling of maintenance reservations Add MAINTENANCE to the available states for nodes that belong in a Slurm reservation with the MAINT flag. Oct 16, 2025
@ekouts ekouts changed the title Add MAINTENANCE to the available states for nodes that belong in a Slurm reservation with the MAINT flag. Improve slurm handling of maintenance reservations Oct 16, 2025
@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.22%. Comparing base (a71c68b) to head (c0d9fb3).
⚠️ Report is 20 commits behind head on develop.

Files with missing lines Patch % Lines
reframe/core/schedulers/slurm.py 50.00% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3573      +/-   ##
===========================================
- Coverage    91.26%   91.22%   -0.05%     
===========================================
  Files           62       62              
  Lines        13534    13540       +6     
===========================================
  Hits         12352    12352              
- Misses        1182     1188       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ekouts ekouts changed the title Improve slurm handling of maintenance reservations Improve handling of maintenance reservations in Slurm Oct 16, 2025
@ekouts ekouts marked this pull request as ready for review October 22, 2025 15:05
Comment on lines +444 to +446
# else:
# raise JobSchedulerError(f"could not extract the reservation "
# f"flags for reservation '{reservation}'")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if we want to fail here or silently ignore it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather simply log it with self.log().

@vkarak vkarak changed the title Improve handling of maintenance reservations in Slurm [feat] Improve handling of maintenance reservations in Slurm Nov 7, 2025
@vkarak vkarak added this to the ReFrame 4.9 milestone Nov 7, 2025
Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the overall fix; I have just some suggestions on the implementation.


flags_match = re.search(r'Flags=(\S+)', completed.stdout)
if flags_match:
if 'MAINT' in flags_match[1].split(','):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if 'MAINT' in flags_match[1].split(','):
if 'MAINT' in flags_match.group(1).split(','):

'''Class representing a Slurm node.'''

def __init__(self, node_descr):
def __init__(self, node_descr, addl_avail_states=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather pass the flags and let the __init__() decide what to do with them, in which case you don't need the addl_avail_states. Simply, add the MAINTENANCE state in the available_states if the flag is set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, just to make sure I understand: _create_nodes will get a flags argument (instead of addl_avail_states), and then pass it in the __init__() of each _SlurmNode and MAINTENANCE will be added in the allowed states of the node?
My thinking was that eventually we may want to allow the user to pass even more additional "available states", through the cli, for some reason. But I am okay with what you are suggesting too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, just to make sure I understand: _create_nodes will get a flags argument (instead of addl_avail_states), and then pass it in the init() of each _SlurmNode and MAINTENANCE will be added in the allowed states of the node?

Yes.

My thinking was that eventually we may want to allow the user to pass even more additional "available states", through the cli, for some reason. But I am okay with what you are suggesting too.

Let's keep it clean at the moment. If we want define custom available states, maybe we should do it with a configuration parameter.

Comment on lines +444 to +446
# else:
# raise JobSchedulerError(f"could not extract the reservation "
# f"flags for reservation '{reservation}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather simply log it with self.log().

@github-project-automation github-project-automation bot moved this from Todo to In Progress in ReFrame Backlog Nov 7, 2025
@vkarak
Copy link
Contributor

vkarak commented Nov 7, 2025

We would also need a unit test.

@vkarak vkarak modified the milestones: ReFrame 4.9, ReFrame 4.10 Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Improve slurm handling of maintenance reservations

2 participants