Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions docker/flux-validator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,83 @@ Validation failed at directives:
--noodles=2: 2
Sep 09 06:48:51.615419 UTC 2025 broker.err[0]: rc2.0: python3 /code/docker/flux-validator/validate.py validate /data/docker/flux-validator/batch-invalid.sh Exited (rc=1) 0.1s
```

#### Canonical jobspecs in YAML or JSON format

##### Valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow the structure above, let's put this directly as another example under Valid. A comment that it is for a canonical jobspec in json/yaml will suffice to categorize it.

```bash
$ docker run -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator /data/docker/flux-validator/implicit-slot.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add implicit-slot.yaml to the repository here as an example (and remove from the README below).

$ echo $?
```

##### Invalid
```bash
$ docker run -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator /data/docker/flux-validator/implicit-slot-invalid.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add implicit-slot-invalid.yaml to the repository too. Feel free to create additional structure for these data files if you think it will better organize.

Traceback (most recent call last):
File "/code/docker/flux-validator/validate.py", line 113, in <module>
run_command()
File "/code/docker/flux-validator/validate.py", line 75, in run_command
return validate(args.path)
File "/code/docker/flux-validator/validate.py", line 99, in validate
jobspec = validate_jobspec(json_content)
File "/usr/lib/python3.10/site-packages/flux/job/Jobspec.py", line 131, in validate_jobspec
jobspec = Jobspec(**jobspec_obj)
File "/usr/lib/python3.10/site-packages/flux/job/Jobspec.py", line 198, in __init__
self._validate_resource(res)
File "/usr/lib/python3.10/site-packages/flux/job/Jobspec.py", line 306, in _validate_resource
raise ValueError("slots must have labels")
ValueError: slots must have labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the output going to an agent, a few thoughts to consider:

  • Are we going to be able to control stdout vs. stdin to only provide one to the agent?
  • If not, do we want to hide the bulk of the traceback and only show the ValueError: slots must have labels?
  • Can we give the agent any more context? (e.g., imagine if there is more than one slot - it will need to deduce which one was missing a label).

I am also getting the exit of the broker for the output:

Nov 03 07:44:12.177820 UTC 2025 broker.err[0]: rc2.0: python3 /code/docker/flux-validator/validate.py validate /data/docker/flux-validator/implicit-slot.yaml Exited (rc=1) 0.1s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it validates when I have a label but I change the name (e.g., default is defined, but then in the resources I called it something else). I don't know if flux checks for that.

```

##### Validate counts
Note: need to override the entrypoint.

```bash
$ docker run --entrypoint flux -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator start python3 /code/docker/flux-validator/validate.py count /data/docker/flux-validator/implicit-slot.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is cool!

One, two, three, core... ah ah ah.

I am the count, I love to count! 🦇

Type: node, count: 1
Type: memory, count: 256
Type: socket, count: 2
Type: gpu, count: 8
Type: slot, count: 4
Type: L3cache, count: 4
Type: core, count: 16
Type: pu, count: 16
```

Where `implicit-slot.yaml` has the following content:
```yaml
version: 9999
resources:
- type: node
count: 1
with:
- type: memory
count: 256
- type: socket
count: 2
with:
- type: gpu
count: 4
- type: slot
count: 2
label: default
with:
- type: L3cache
count: 1
with:
- type: core
count: 4
with:
- type: pu
count: 1

# a comment
attributes:
system:
duration: 3600
tasks:
- command: [ "app" ]
slot: default
count:
per_slot: 1
```
46 changes: 38 additions & 8 deletions docker/flux-validator/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@

import argparse
import sys
import yaml
import json

from rich import box
from rich.console import Console
from rich.padding import Padding
from rich.panel import Panel

from flux.job.Jobspec import validate_jobspec

import fractale.utils as utils

# This will pretty print all exceptions in rich
Expand Down Expand Up @@ -48,10 +52,17 @@ def get_parser():
description="validate flux batch script",
)
validate.add_argument("path", help="path to batch.sh to validate")

count = subparsers.add_parser(
"count",
formatter_class=argparse.RawTextHelpFormatter,
description="count resources in flux batch script",
)
count.add_argument("path", help="path to batch.yaml to count resources")
return parser


def run_validate():
def run_command():
parser = get_parser()
if len(sys.argv) == 1:
help()
Expand All @@ -62,22 +73,41 @@ def run_validate():
# Here we can assume instantiated to get args
if args.command == "validate":
return validate(args.path)
elif args.command == "count":
return count_resources(args.path)
raise ValueError(f"The command {args.command} is not known")


def validate(path):
"""
Validate the path to a batch.sh or similar.
"""
validator = Validator("batch")
jobspec = None
content = utils.read_file(path)
try:
# Setting fail fast to False means we will get ALL errors at once
validator.validate(path, fail_fast=False)
except Exception as e:
display_error(content, str(e))
sys.exit(1)
yaml_content = yaml.safe_load(content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: run pre-commit run --all-files to fix isort, etc. I know, it should be in CI, and it's not. :)

json_content = json.dumps(yaml_content)
except Exception:
validator = Validator("batch")
try:
# Setting fail fast to False means we will get ALL errors at once
validator.validate(path, fail_fast=False)
except Exception as e:
display_error(content, str(e))
sys.exit(1)
else:
jobspec = validate_jobspec(json_content)
return jobspec


def count_resources(path):
"""
Count the resources in the path to a batch.yaml or similar.
"""
jobspec = validate(path)
for res in jobspec[1].resource_walk():
print(f"Type: {res[1]['type']}, count: {res[2]}")


if __name__ == "__main__":
run_validate()
run_command()