Add database constraint for jobs (through partial index)

We currently have an ednpoint-level check like
```python
    # Check that no other job with the same dataset_id is SUBMITTED
    stm = (
        select(JobV2)
        .where(JobV2.dataset_id == dataset_id)
        .where(JobV2.status == JobStatusType.SUBMITTED)
    )
    res = await db.execute(stm)
    if res.scalars().all():
        raise HTTPException(
            status_code=status.HTTP_422_UNPROCESSABLE_CONTENT,
            detail=(
                f"Dataset {dataset_id} is already in use in submitted job(s)."
            ),
        )
```
but we observed at least once a situation where two "submitted" jobs co-existed.

---

It'd be best to make this more robust, by also having a postgres unique index on "jobs with dataset_id = X and status = submitted (where X is not null)".
I think a partial index should be appropriate:
* https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#partial-indexes
* "Example 11.3. Setting up a Partial Unique Index" section of https://www.postgresql.org/docs/current/indexes-partial.html


If we are able to implement this at db level, we could:
* leave the current check in-place, to provide a user-friendly error message
* rely on the db check to avoid inconsistent state in some unexpected scenarios (this would result in a 500 response and `IntegrityError` internal error - which is anyway much better than having two submitted jobs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add database constraint for jobs (through partial index) #3040

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add database constraint for jobs (through partial index) #3040

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions