Skip to content

Conversation

@vytautas9
Copy link
Contributor

Currently, we have functions for majority of API wrappers of the Job Scheduler endpoints. However at least one is missing to cancel running Job Instances. This is the Fabric API that does the exact - Job Scheduler - Cancel Item Job Instance.

The PR adds a new cancel_item_job_instance function to the src/sempy_labs/_job_scheduler.py module.

P.S. I had to make sure that lro would not be triggered for this endpoint as the endpoint does not return x-ms-operation-id and instead returns x-ms-job-id + location with logic being a bit different than handling the x-ms-operation-id case.

@vytautas9
Copy link
Contributor Author

Solves #950

Copy link
Collaborator

@m-kovalsky m-kovalsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor changes. Be sure to test it as well.

"""
Cancel an item's job instance.
This is a wrapper function for the following API: `Job Scheduler - Cancel Item Job Instance <https://learn.microsoft.com/en-us/rest/api/fabric/core/job-scheduler/cancel-item-job-instance>`_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the /en-US part please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Also added note on SP authentication support.

@m-kovalsky
Copy link
Collaborator

One more thing in the review - the function should wait for the cancellation to complete. For this you need to check the status and wait for it to complete. See how it was done here although this one may be slightly different.

status_url = response.headers.get("Location").split("fabric.microsoft.com")[1]

@vytautas9
Copy link
Contributor Author

One more thing in the review - the function should wait for the cancellation to complete. For this you need to check the status and wait for it to complete. See how it was done here although this one may be slightly different.

status_url = response.headers.get("Location").split("fabric.microsoft.com")[1]

Added the following:

  • Error handling when the job was already cancelled to not throw exception
  • Waiting loop to wait for the job to finish
  • Return dataframe of the latest status

@vytautas9
Copy link
Contributor Author

Btw I've noticed a possible issue with _get_item_job_instance internal function. It's made to handle multiple responses but if I am not mistaken, the endpoint always return a single response so now the function always return empty dataframe.

As this function is also used in other modules, didn't wanted to try to fix it in the current PR and might test more and create another PR.

@m-kovalsky m-kovalsky mentioned this pull request Nov 10, 2025
@m-kovalsky
Copy link
Collaborator

Btw I've noticed a possible issue with _get_item_job_instance internal function. It's made to handle multiple responses but if I am not mistaken, the endpoint always return a single response so now the function always return empty dataframe.

As this function is also used in other modules, didn't wanted to try to fix it in the current PR and might test more and create another PR.

#958 should solve that. Thanks!

item=item, type=type, workspace=workspace
)

try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the try/except here won't do much because the exception is already handled within the _base_api function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/except will catch the exception thrown by _base_api function that is handled internally at _base_api and will "silence" the exception if it's 400 and JobAlreadyCompleted, if the exception is any other than this, it will reraise it with raise in the else statement.

This is how it looks when the _base_api returns 400 and JobAlreadyCompleted and we handle it:
image
So the only added benefit of this try/except is to make the code not fail in case we cancel an already finished/cancelled Job Instance.

This is how it looks when the _base_api returns an exception that we don't handle:
image
In this case, the exception is not the one we handle so we re-raise the exception cause by _base_api + output additional information to console.

Let me know if you'd like this to be changed or the try/except completely removed. The reasoning behind this was to make the code/notebook not fail if we try to cancel an already finished/cancelled Job Instance (this exception could also be handled by the end users themselves inside code/notebooks).

Copy link
Collaborator

@m-kovalsky m-kovalsky Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use try/except as it can lead to strange things not being seen properly. What if instead we checked the status of the job and if it is 'Completed' just print an output saying the job is completed and then return. If the job is not completed it can simply run the cancel API step. But the check of the status should not loop. It should be a 1-time check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ay, can do that. The flow then would be to check the status of the Job Instance before running the cancel API.

Do you think this flow then makes sense?:

  1. Check status
  2. (if Completed/Failed/Cancelled) Stop and output that the Job is finished and return status.
  3. (else (not started / in progress) Run Cancel API
  4. Wait few seconds and check status and return
  5. (if Job is finished) output the job is cancelled + return status
  6. (if Job has not yet finished) output the job cancel is in progress + return status

At the end, as we don't do a loop, if the cancel operation takes a bit longer (and we don't catch it) it is left to end user to implement a wait (if it's needed).

P.S. This complication is only because we can't get information about cancel operation and we don't get Retry-After for example.

print(
f"{icons.green_dot} The Job Instance '{job_instance_id}' of '{item_name}' {type.lower()} within the '{workspace_name}' workspace has been cancelled successfully."
)
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 'else' portion would never actually be reached because it would still be in the while loop.

df = _get_item_job_instance(url=status_url)

# Check what is the final status of the Job Instance.
if status in ["Completed", "Failed", "Cancelled"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all 3 of these outcomes mean cancelled successfully? i don't think this is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know (might be wrong) but we don't have a separate API endpoint to check the status of Cancel operation for Job Instance's. Unlike with Long Running Operations (where we can get the status of the operation), cancelling a Job Instance does not return the operation in it's headers and instead returns only the location of the job instance.

So what I am checking is the status of the Job Instance from https://learn.microsoft.com/en-us/rest/api/fabric/core/job-scheduler/get-item-job-instance?tabs=HTTP (returned also by the headers as location). We wait till the status of it is no longer not started / in progress.

For example, if we cancel an active notebook Spark session: the job instance is marked as "Completed"; if we cancel running pipeline: the job instance is marked as "Cancelled".

Do you know another approach perhaps?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the status shows as 'Cancelled' then the cancel job ran successfully. If the status shows as 'Failed' then the job failed. If the status shows as 'Completed' then the cancel job did not succeed and the job succeeded. Correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really (as far as I know). The status shows the status of the Job Instance itself and not the cancel operation. I have not found a way to retrieve the status of the cancel operation (not sure if there's one, not like in Long Running Operations. At least didn't found it in the response headers + documented APIs).

For example cancelling running Spark Job Instance (stopping) marks it as "Completed" instead of "Cancelled".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants