|
| 1 | +# Service Provider Design |
| 2 | + |
| 3 | +## Goals |
| 4 | + |
| 5 | +- Define clear terminology around `ServiceProvider` in the OpenMCP space |
| 6 | +- Define `ServiceProvider` scope: responsibilities and boundaries of a `ServiceProvider` |
| 7 | +- Define a `ServiceProvider` model that implements the higher level `API`/`Run` platform concept (to allow flexible deployment models, e.g. with `ClusterProvider` kcp) |
| 8 | +- Define `ServiceProvider` contract to implement `ServiceProvider` as a loosely coupled component in the openMCP context |
| 9 | +- Define how a `ServiceProvider` can be validated |
| 10 | + |
| 11 | +## Non-Goals |
| 12 | + |
| 13 | +tbd |
| 14 | + |
| 15 | +## Object Model |
| 16 | + |
| 17 | +```mermaid |
| 18 | +graph TD |
| 19 | + %% Onboarding Cluster |
| 20 | + subgraph OnboardingCluster/API |
| 21 | + SC[ServiceConfig] |
| 22 | + end |
| 23 | +
|
| 24 | + %% Platform Cluster |
| 25 | + subgraph PlatformCluster/RUN |
| 26 | + SPO[service-provider-operator] |
| 27 | + SP[ServiceProvider] |
| 28 | + SPC[ServiceProviderConfig] |
| 29 | + end |
| 30 | +
|
| 31 | + %% MCP Cluster |
| 32 | + subgraph MCPCluster/RUN |
| 33 | + DS[DomainService] |
| 34 | + DSAPI[DomainServiceAPI] |
| 35 | + end |
| 36 | +
|
| 37 | + %% WorkloadCluster |
| 38 | + subgraph WorkloadCluster/RUN |
| 39 | + SDS[SharedDomainService] |
| 40 | + end |
| 41 | +
|
| 42 | + %% edges |
| 43 | + SP -->|installs/reconciles|SC |
| 44 | + SP -->|uses|SPC |
| 45 | + SP -->|creates/updates/deletes|DS |
| 46 | + SP -->|creates/updates/deletes|SDS |
| 47 | + DS -->|installs/reconciles|DSAPI |
| 48 | + SDS --->|reconciles/XOR|DSAPI |
| 49 | + SPO-->|installs/reconciles|SP |
| 50 | +``` |
| 51 | + |
| 52 | +Open Points: |
| 53 | + |
| 54 | +- Does the `openmcp-operator` manage `ServiceProviders` or do we introduce a new operator for `ServiceProviders`? Benefits of a new component could be clear separation of concerns. The `openmcp-operator` already does a lot and we don't want the next `control-plane-operator`. |
| 55 | +- In the above model the `OnboardingCluster` is a continuous `API` cluster. We might want to provision dedicated or shared tenant `API` servers (e.g. with `ClusterProvider` kcp) based on some kind of component discovery that lets the tenant pick its feature/component set. This way the `OnboardingCluster` is only used to onboard new tenants. And we don't run into CRD management hell/bottlenecks. |
| 56 | +- Another thought regarding the `OnboardingCluster`. If we introduce tenant `API` clusters, they could be used to create MCPs. This again implies that instead of having the `OnboardingCluster` create `MCPs`, we might want to have the `OnboardingCluster` create `Tenants` as the entry point for users -> start with an identity object like `Tenant` or `Account` instead of a usage artifact like `MCP`. |
| 57 | + |
| 58 | +TODO: |
| 59 | + |
| 60 | +- Illustrate different deployment models with `Run`/`API` concept |
| 61 | +- Visually distinguish between `Run` and `API` artifacts |
| 62 | + |
| 63 | +## Terminology |
| 64 | + |
| 65 | +Defines the objects of the [object model](#object-model) |
| 66 | + |
| 67 | +- `ServiceProvider` provides a service in tenant space |
| 68 | +- `PlatformService` provides a service in platform space |
| 69 | +- `Run` clusters support scheduling workloads. A `Run` cluster may or may not also serve as `API` cluster. |
| 70 | +- `API` clusters serve APIs but do not support scheduling workload (note that `API`/`Run` is a higher level platform concept) |
| 71 | +- `OnboardingCluster` is part of the platform domain and the config/setup part from a tenant perspective. It serves the `API` of a `ServiceProvider` |
| 72 | +- `MCPCluster` is part of the tenant domain and the application/functional part from a tenant perspective. It may or may not run the `Run` of a `ServiceProvider` |
| 73 | +- `PlatformCluster` is part of the platform domain and a black box from a tenant perspective. It may or may not run the `Run` of a `ServiceProvider` |
| 74 | +- A `ServiceConfig` defines the service provisioning in terms of the `DomainService` `API` and `Run` where e.g. Crossplane could be provisioned for a tenant by installing the `API` on the tenant MCP but the `Run` on a shared worker pool (`WorkloadCluster`) (clarify tenant IAM). A tenant can use this mechanism to decide how to consume a service. |
| 75 | +- A `ServiceProviderConfig` defines the config parts that are used in reconcile run, e.g. to define tenant boundaries |
| 76 | + |
| 77 | +## Boundaries |
| 78 | + |
| 79 | +- A `PlatformService` (e.g. `service-provider-operator`) watches platform `API` clusters, e.g. the `OnboardingCluster` and acts on platform `Run` clusters, e.g. itself or shared `WorkloadClusters`. It does not act on tenant clusters, e.g. MCPs |
| 80 | +- A `ServiceProvider` watches tenant `API` clusters, e.g. the `OnboardingCluster` and acts on `Run` clusters, e.g. MCPs. |
| 81 | + |
| 82 | +tbc platform space vs tenant space |
| 83 | + |
| 84 | +## Lifecycle |
| 85 | + |
| 86 | +- A `PlatformService` is installed by a platform team and/or bootstrapping mechanism (out of scope) |
| 87 | +- A `ServiceProvider` is installed by creating ServiceProvider objects, the `service-provider-operator` manages the lifecycle of `ServiceProviders`... advantages disadvantages |
| 88 | + |
| 89 | +## Validation |
| 90 | + |
| 91 | +A `ServiceProvider` is considered healthy if both its `API` and `Run` part have been successfully synced and are ready for consumption. |
| 92 | + |
| 93 | +The following validation flow validates that a `ServiceProvider` is working as expected: |
| 94 | + |
| 95 | +0. SETUP: Create test environment by installing any `ServiceProvider` prerequisite: a) k8s cluster, e.g. kind, b) install `service-provider-operator` -> wait for operator to be available |
| 96 | +1. ASSESS: Request `ServiceProvider` -> wait for `API` and `Run` components to be `synced` and `ready` |
| 97 | +2. ASSESS: Consume `API` to provision `DomainService` -> wait for DomainService to be `synced` and `ready` |
| 98 | +3. ASSESS: (optional) Consume `DomainServiceAPI` depending on the provider/domain context this may or may not be required |
| 99 | +4. ASSESS: Delete `ServiceProvider` -> wait for `API`, `Run`, `ServiceProvider` to be successfully removed |
| 100 | +5. TEARDOWN: Delete test environment components |
| 101 | + |
| 102 | +## Runtime |
| 103 | + |
| 104 | +What is a runtime? A runtime is a collection of abstractions and contracts that provides an environment in which user-defined logic is executed. |
| 105 | + |
| 106 | +The service provider runtime is built on top of controller-runtime and provides a service provider specific reconciliation loop. |
| 107 | + |
| 108 | +It provides: |
| 109 | + |
| 110 | +- client abstractions (in xp external clients, in openmcp e.g. reuse common juggler reconcilers like flux?) |
| 111 | +- lifecycle management abstractions of `ServiceProviderAPI` objects (the reconcile loop) |
| 112 | +- platform specific features (in xp e.g. late initialize, external-name and pause annotations), enables us to implement platform features for all service providers (a `ServiceProvider` only needs to update their runtime dependency) |
| 113 | +- handling of cross-cutting concerns like event recording, logging, metrics, rate limits |
| 114 | + |
| 115 | +The following overview illustrates the layers in a simplified way: |
| 116 | + |
| 117 | +| Layer | Description | |
| 118 | +| :--- | :--- | |
| 119 | +| Service Provider | defines `ServiceProviderAPI` and implements service-provider-runtime operations | |
| 120 | +| service-provider-runtime | defines ServiceProvider reconciliation semantics | |
| 121 | +| controller-runtime | defines generic reconciliation semantics | |
| 122 | +| Kubernetes API machinery | k8s essentials | |
| 123 | +| Go runtime / OS kernel | process/thread execution, memory management | |
| 124 | + |
| 125 | +### Execution Model |
| 126 | + |
| 127 | +Here we define what a run/reconcile cycle means, e.g. observe followed by an orchestration of actions like create, update, delete. |
| 128 | + |
| 129 | +This may include special domain semantics similar to `ManagementPolicies` or the `pause` state/mechanism in Crossplane. |
| 130 | + |
| 131 | +### Abstractions and Contracts |
| 132 | + |
| 133 | +Here we define the core interfaces that a consumer (`ServiceProvider` developer) has to implement, e.g. in Crossplane `ExternalConnector` creates `ExternalClient` which implements CRUD operations with `ExternalObservation`, `ExternalCreation`, etc. `Managed` interface defines what makes a k8s object a managed Crossplane resource, e.g. by referencing a `ProviderConfig`, specifying `ManagementPolicies`, `ConnectionSecrets`, etc. |
| 134 | + |
| 135 | +### Observability |
| 136 | + |
| 137 | +Logging, metrics, traces? |
| 138 | + |
| 139 | +## Domain |
| 140 | + |
| 141 | +The actual domain layer of a `ServiceProvider` (layer on top of the [runtime](#runtime)). The foundation to build a `ServiceProvider` template. |
| 142 | + |
| 143 | +### RBAC |
| 144 | + |
| 145 | +What permissions does a service provider need... |
| 146 | + |
| 147 | +## Service Provider Manager |
| 148 | + |
| 149 | +The component that manages the lifecyclee of `ServiceProviders` and provides service discovery to platform `API` clusters, e.g. `OnboardingCluster`. |
| 150 | + |
| 151 | +candidates e.g. `openmcp-operator` or `service-provider-operator` |
| 152 | + |
| 153 | +out of scope? |
0 commit comments