-
Notifications
You must be signed in to change notification settings - Fork 155
Add function to cancel item job instance #952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add function to cancel item job instance #952
Conversation
|
Solves #950 |
m-kovalsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor changes. Be sure to test it as well.
src/sempy_labs/_job_scheduler.py
Outdated
| """ | ||
| Cancel an item's job instance. | ||
| This is a wrapper function for the following API: `Job Scheduler - Cancel Item Job Instance <https://learn.microsoft.com/en-us/rest/api/fabric/core/job-scheduler/cancel-item-job-instance>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the /en-US part please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Also added note on SP authentication support.
|
One more thing in the review - the function should wait for the cancellation to complete. For this you need to check the status and wait for it to complete. See how it was done here although this one may be slightly different.
|
Added the following:
|
|
Btw I've noticed a possible issue with As this function is also used in other modules, didn't wanted to try to fix it in the current PR and might test more and create another PR. |
#958 should solve that. Thanks! |
| item=item, type=type, workspace=workspace | ||
| ) | ||
|
|
||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the try/except here won't do much because the exception is already handled within the _base_api function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try/except will catch the exception thrown by _base_api function that is handled internally at _base_api and will "silence" the exception if it's 400 and JobAlreadyCompleted, if the exception is any other than this, it will reraise it with raise in the else statement.
This is how it looks when the _base_api returns 400 and JobAlreadyCompleted and we handle it:

So the only added benefit of this try/except is to make the code not fail in case we cancel an already finished/cancelled Job Instance.
This is how it looks when the _base_api returns an exception that we don't handle:

In this case, the exception is not the one we handle so we re-raise the exception cause by _base_api + output additional information to console.
Let me know if you'd like this to be changed or the try/except completely removed. The reasoning behind this was to make the code/notebook not fail if we try to cancel an already finished/cancelled Job Instance (this exception could also be handled by the end users themselves inside code/notebooks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not use try/except as it can lead to strange things not being seen properly. What if instead we checked the status of the job and if it is 'Completed' just print an output saying the job is completed and then return. If the job is not completed it can simply run the cancel API step. But the check of the status should not loop. It should be a 1-time check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ay, can do that. The flow then would be to check the status of the Job Instance before running the cancel API.
Do you think this flow then makes sense?:
- Check status
- (if Completed/Failed/Cancelled) Stop and output that the Job is finished and return status.
- (else (not started / in progress) Run Cancel API
- Wait few seconds and check status and return
- (if Job is finished) output the job is cancelled + return status
- (if Job has not yet finished) output the job cancel is in progress + return status
At the end, as we don't do a loop, if the cancel operation takes a bit longer (and we don't catch it) it is left to end user to implement a wait (if it's needed).
P.S. This complication is only because we can't get information about cancel operation and we don't get Retry-After for example.
| print( | ||
| f"{icons.green_dot} The Job Instance '{job_instance_id}' of '{item_name}' {type.lower()} within the '{workspace_name}' workspace has been cancelled successfully." | ||
| ) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 'else' portion would never actually be reached because it would still be in the while loop.
| df = _get_item_job_instance(url=status_url) | ||
|
|
||
| # Check what is the final status of the Job Instance. | ||
| if status in ["Completed", "Failed", "Cancelled"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all 3 of these outcomes mean cancelled successfully? i don't think this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know (might be wrong) but we don't have a separate API endpoint to check the status of Cancel operation for Job Instance's. Unlike with Long Running Operations (where we can get the status of the operation), cancelling a Job Instance does not return the operation in it's headers and instead returns only the location of the job instance.
So what I am checking is the status of the Job Instance from https://learn.microsoft.com/en-us/rest/api/fabric/core/job-scheduler/get-item-job-instance?tabs=HTTP (returned also by the headers as location). We wait till the status of it is no longer not started / in progress.
For example, if we cancel an active notebook Spark session: the job instance is marked as "Completed"; if we cancel running pipeline: the job instance is marked as "Cancelled".
Do you know another approach perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the status shows as 'Cancelled' then the cancel job ran successfully. If the status shows as 'Failed' then the job failed. If the status shows as 'Completed' then the cancel job did not succeed and the job succeeded. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really (as far as I know). The status shows the status of the Job Instance itself and not the cancel operation. I have not found a way to retrieve the status of the cancel operation (not sure if there's one, not like in Long Running Operations. At least didn't found it in the response headers + documented APIs).
For example cancelling running Spark Job Instance (stopping) marks it as "Completed" instead of "Cancelled".
Currently, we have functions for majority of API wrappers of the Job Scheduler endpoints. However at least one is missing to cancel running Job Instances. This is the Fabric API that does the exact - Job Scheduler - Cancel Item Job Instance.
The PR adds a new
cancel_item_job_instancefunction to thesrc/sempy_labs/_job_scheduler.pymodule.P.S. I had to make sure that lro would not be triggered for this endpoint as the endpoint does not return
x-ms-operation-idand instead returnsx-ms-job-id+locationwith logic being a bit different than handling thex-ms-operation-idcase.