Skip to content

Conversation

@cmeesters
Copy link
Member

@cmeesters cmeesters commented Nov 3, 2025

Addresses issue #384, possibly #284, too.

It submits Snakemake itself and ensures invoking the jobstep plugin.

There is an issue with that: Snakemake' --retries flag will be forwarded to the remote executor, the jobstep executor. The user may try new attempts with different resources, but will never get them in the new job context: The resource allocation is fixed per active SLURM job and cannot be extended. Any retry attempt has to be triggered from the main Snakemake process.

Hence, we need a regex to always convert --attempt <some integer> into --attempt 1 in the exec_job string.

@johanneskoester as we agreed not to temper the exec_job string, I assigned you as a reviewer.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed retry attempt handling in remote job submissions. Retry attempts are now properly normalized to a single attempt in the remote execution context, ensuring consistent job behavior across Slurm submissions and preventing unintended retry propagation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 3, 2025

Walkthrough

Adds a normalization step to the Slurm SBATCH wrap command that coerces any Snakemake retry attempts (specified as --attempt <n> or --attempt=<n>) to a single attempt (--attempt 1) in the remote context. This normalization occurs after building the exec_job and before appending it to the sbatch wrap argument.

Changes

Cohort / File(s) Summary
Slurm SBATCH wrap command normalization
snakemake_executor_plugin_slurm/__init__.py
Inserts normalization logic to coerce retry attempts to --attempt 1 in the remote jobstep context, processing the exec_job before wrapping in the sbatch command

Sequence Diagram

sequenceDiagram
    participant Builder as Exec Job Builder
    participant Normalizer as Normalizer
    participant Wrapper as SBATCH Wrapper
    
    Builder->>Normalizer: exec_job with potential<br/>--attempt <n>
    rect rgb(240, 240, 240)
        note over Normalizer: Normalization Phase
        Normalizer->>Normalizer: Scan for --attempt flags<br/>(any format/value)
        Normalizer->>Normalizer: Coerce to --attempt 1
    end
    Normalizer->>Wrapper: normalized exec_job
    Wrapper->>Wrapper: Append to --wrap<br/>argument
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Focus on the normalization logic implementation to verify it correctly identifies and replaces all --attempt flag variations
  • Confirm the coercion to --attempt 1 handles edge cases (missing values, different spacing, etc.)

Suggested reviewers

  • johanneskoester

Poem

🐰 A retry dance, once wild and free,
Now tamed to one, as it shall be,
The remote job hops with grace,
No more retries in that space,
One attempt per sbatch embrace! ✨

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'fix: #348 no attempts to jobstep executor' clearly and specifically summarizes the main change in the changeset. It indicates this is a bug fix addressing issue #348 and describes the core intent: preventing retry attempts from being forwarded to the jobstep executor. The title is concise, descriptive, and accurately reflects the primary objective of normalizing the --attempt flag to 1 in the remote jobstep context.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/retry_forwarding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cmeesters
Copy link
Member Author

If you agree, @johanneskoester , I will create a test file, accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

346-349: Good solution to prevent retry forwarding.

The regex normalization correctly addresses the issue of forwarding retry attempts to the jobstep executor where resource changes cannot occur. The pattern handles both --attempt=<n> and --attempt <n> formats appropriately, and forcing --attempt 1 ensures retries are managed by the main Snakemake process.

Consider adding a word boundary for additional safety:

-        exec_job = re.sub(r"--attempt(?:[=\s]+)\d+", "--attempt 1", exec_job)
+        exec_job = re.sub(r"\b--attempt(?:[=\s]+)\d+", "--attempt 1", exec_job)

This prevents the unlikely scenario where --attempt appears as a substring in an unexpected context (e.g., within a larger string). The current implementation is functional, but the word boundary \b adds defensive robustness.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c980d1f and d746214.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 333
File: .github/workflows/announce-release.yml:0-0
Timestamp: 2025-07-22T17:31:41.629Z
Learning: In the snakemake-executor-plugin-slurm repository, cmeesters prefers using `main` references for GitHub Actions to always get the latest version, with security provided through protected branches and mandatory CI checks rather than commit SHA pinning.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-09-09T15:55:55.669Z
Learning: In the Snakemake SLURM executor plugin, Snakemake submits itself as a job using the --wrap parameter in the sbatch command, rather than creating or submitting a separate job.sh script file. The actual implementation uses `call += f' --wrap="{exec_job}"'` where exec_job is the formatted snakemake execution command.
📚 Learning: 2025-09-09T15:55:55.669Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-09-09T15:55:55.669Z
Learning: In the Snakemake SLURM executor plugin, Snakemake submits itself as a job using the --wrap parameter in the sbatch command, rather than creating or submitting a separate job.sh script file. The actual implementation uses `call += f' --wrap="{exec_job}"'` where exec_job is the formatted snakemake execution command.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-20T09:13:26.443Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants