-
Notifications
You must be signed in to change notification settings - Fork 117
[feat] Improve handling of maintenance reservations in Slurm #3573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
MAINTENANCE to the available states for nodes that belong in a Slurm reservation with the MAINT flag.
… feat/maint_nodes
MAINTENANCE to the available states for nodes that belong in a Slurm reservation with the MAINT flag.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3573 +/- ##
===========================================
- Coverage 91.26% 91.22% -0.05%
===========================================
Files 62 62
Lines 13534 13540 +6
===========================================
Hits 12352 12352
- Misses 1182 1188 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… feat/maint_nodes
| # else: | ||
| # raise JobSchedulerError(f"could not extract the reservation " | ||
| # f"flags for reservation '{reservation}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if we want to fail here or silently ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather simply log it with self.log().
vkarak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the overall fix; I have just some suggestions on the implementation.
|
|
||
| flags_match = re.search(r'Flags=(\S+)', completed.stdout) | ||
| if flags_match: | ||
| if 'MAINT' in flags_match[1].split(','): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if 'MAINT' in flags_match[1].split(','): | |
| if 'MAINT' in flags_match.group(1).split(','): |
| '''Class representing a Slurm node.''' | ||
|
|
||
| def __init__(self, node_descr): | ||
| def __init__(self, node_descr, addl_avail_states=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather pass the flags and let the __init__() decide what to do with them, in which case you don't need the addl_avail_states. Simply, add the MAINTENANCE state in the available_states if the flag is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, just to make sure I understand: _create_nodes will get a flags argument (instead of addl_avail_states), and then pass it in the __init__() of each _SlurmNode and MAINTENANCE will be added in the allowed states of the node?
My thinking was that eventually we may want to allow the user to pass even more additional "available states", through the cli, for some reason. But I am okay with what you are suggesting too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, just to make sure I understand: _create_nodes will get a flags argument (instead of addl_avail_states), and then pass it in the init() of each _SlurmNode and MAINTENANCE will be added in the allowed states of the node?
Yes.
My thinking was that eventually we may want to allow the user to pass even more additional "available states", through the cli, for some reason. But I am okay with what you are suggesting too.
Let's keep it clean at the moment. If we want define custom available states, maybe we should do it with a configuration parameter.
| # else: | ||
| # raise JobSchedulerError(f"could not extract the reservation " | ||
| # f"flags for reservation '{reservation}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather simply log it with self.log().
|
We would also need a unit test. |
Add
MAINTENANCEto the available states for nodes that belong in a reservation with theMAINTflag, when the user explicitly requests for this reservation.Fixes #3572 .