Skip to content

Conversation

@slekkala1
Copy link
Contributor

@slekkala1 slekkala1 commented Oct 29, 2025

What does this PR do?

Linked issue #4179

Current PR Scope (Stubs + API)

  • Job Scheduler API protocol (api.py)
  • Configuration classes (config.py)
  • Stub implementations (raise NotImplementedError)
  • Unit tests (verify API exists)
  • Documentation (README with patterns)

Future PR Scope (Implementation)

  • Provider pattern (get_provider_impl)
  • Registry entry (registry/job_scheduler.py)
  • Actual scheduler implementation
  • Integration with vector_io (add to optional_api_dependencies)

Test Plan

UTs

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 29, 2025
@slekkala1 slekkala1 marked this pull request as ready for review November 4, 2025 19:16
from typing import Protocol


class JobStatus(StrEnum):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statuses for batch apis in OpenAI are:

    status: Literal[
        "validating", "failed", "in_progress", "finalizing", "completed", "expired", "cancelling", "cancelled"
    ]

https://github.com/openai/openai-python/blob/6574bcd612771186995074846253caa0ff1ba517/src/openai/types/batch.py#L41

it could be rational for us to follow their statuses with the additional PENDING state (since I assume they lump that into in_progress).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks for the pointer!

@ashwinb
Copy link
Contributor

ashwinb commented Nov 7, 2025

@slekkala1 it would be good to file an Issue for this work if one doesn't exist describing why this is being undertaken and perhaps referencing the older discussions / issues also which I had shared with you.


from .scheduler import CelerySchedulerImpl

__all__ = ["CelerySchedulerImpl"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks almost like you're constructing an API for /jobs inside of provider/utils. Should this instead be an API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdoern that is quite intended since we do not want to expose them as public APIs to the end user. these are "stack internal" APIs. It would be good to have an "infrastructure" for these APIs (particularly if want external providers to target them maybe?) but we don't have that right now. Our KVStore / SQLStore APIs are similar in spirit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdoern I do think it is a good idea to create llama_stack_api.internal and situate these APIs there. Could be a good follow-up. Maybe I will try to get the kvstore / sqlstore ones there once your PR lands.

CANCELLED = "cancelled"


class Scheduler(Protocol):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, this is like a shadow API inside of providers/utils. I think it'd make sense to expose this at a higher level. WDYT?

Copy link
Contributor Author

@slekkala1 slekkala1 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdoern yes these are the internal apis same as KVstore/SQL store APIs (as Ashwin mentions above)

@slekkala1
Copy link
Contributor Author

@slekkala1 it would be good to file an Issue for this work if one doesn't exist describing why this is being undertaken and perhaps referencing the older discussions / issues also which I had shared with you.

Filed a new issue #4179 linked to the old issue inside it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this PR adds types and an implementation, should we implement some of these methods? otherwise we are introducing an API structure without validation its the right shape for any of the providers.

alternatively, lets just drop the celery stub until we actually go to implement it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

We can also have a full implementation here and split it up into two PRs.

Copy link
Collaborator

@leseb leseb Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to split into API and Implementation, 2 PRs. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants