-
Notifications
You must be signed in to change notification settings - Fork 559
Much better Pro docs #4263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: docs/pro-vs-oss
Are you sure you want to change the base?
Much better Pro docs #4263
Conversation
Documentation Link Check Results❌ Absolute links check failed |
🔍 Broken Links ReportSummary
Details
📂 Full file paths
|
| 1. **Code Execution**: You write code and run pipelines with your client SDK using Python | ||
| 2. **Authentication & Token Acquisition**: | ||
| - Users authenticate via your internal identity provider (LDAP/AD/OIDC) | ||
| - The ZenML Pro control plane (running in your infrastructure) handles authentication and RBAC | ||
| - The ZenML client fetches short-lived tokens from your ZenML workspace for: | ||
| - Pushing Docker images to your container registry | ||
| - Communicating with your artifact store | ||
| - Submitting workloads to your orchestrator | ||
| - *Note: Your local Python environment needs the client libraries for your stack components* | ||
| 3. **Authorization**: RBAC policies enforced by your control plane before token issuance | ||
| 4. **Image & Workload Submission**: The client pushes Docker images (and optionally code if no code repository is configured) to your container registry, then submits the workload to your orchestrator | ||
| 5. **Orchestrator Execution**: In the orchestrator environment within your infrastructure: | ||
| - The Docker image is pulled from your container registry | ||
| - Within the pipeline/step entrypoint, the necessary code is pulled in | ||
| - A connection to your ZenML workspace is established | ||
| - The relevant pipeline/step code is executed | ||
| 6. **Runtime Data Flow**: During execution (all within your infrastructure): | ||
| - Pipeline and step run metadata is logged to your ZenML workspace | ||
| - Logs are streamed to your log backend | ||
| - Artifacts are written to your artifact store | ||
| - Metadata pointing to these artifacts is persisted in your workspace | ||
| 7. **Observability**: The ZenML Pro dashboard (running in your infrastructure) connects to your workspace and uses all persisted metadata to provide you with a complete observability plane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefannica fact check pls
|
|
||
| The diagram above illustrates a complete air-gapped ZenML Pro deployment with all components running within your organization's VPC. This architecture ensures zero external communication while providing full enterprise MLOps capabilities. | ||
|
|
||
| ### Architecture Components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefannica fact check pls
| - **Backup sites** for disaster recovery | ||
| - **Monitoring and alerting** for all components | ||
|
|
||
| ## Pre-requisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefannica fact check pls
|
https://zenml-io.gitbook.io/alexej/zenml-pro - view here to see it in action |
htahir1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its good for a first round. many comments apply to many pages
| - ✅ **Vulnerability Assessment Reports** available on request | ||
| - ✅ **Software Bill of Materials (SBOM)** available on request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefannica should verify this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can provide this on request
|
|
||
| All three deployment scenarios follow a similar pipeline execution pattern, with differences in where authentication happens and where data resides: | ||
|
|
||
| ### Standard Data Flow Steps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs a diagram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - we might even have one laying around somewhere
|
|
||
| **SaaS**: Metadata is stored in ZenML infrastructure. Your ML data and compute remain in your infrastructure. | ||
|
|
||
| **Hybrid**: Metadata and control plane are split — authentication/RBAC happens at ZenML control plane, but all run metadata, artifacts, and compute stay in your infrastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thnk the authentication bit is the most important here and isnt really elaborated but maybe it is later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What more would you like to know about this at this stage?
|
|
||
| You control this access by configuring appropriate cloud IAM permissions. | ||
|
|
||
| ## Getting Started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO super strnage to have this whole section here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole section ? maybe we dont need the example pipeline - butt i like how it shows how quickly youi're ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm really? its in the dashboard already when you sign up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well somebody in the docs here wants to know what complexity awaits them - "Is it worth my time?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure tbh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my experience these are the questions we get very early on
Co-authored-by: Hamza Tahir <hamza@zenml.io>
… docs/better-pro-docs
|
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 34%, saving 132.21 KB.
383 images did not require optimisation. Update required: Update image-actions configuration to the latest version before 1/1/21. See README for instructions. |
stefannica
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first round of reviews, more to follow...
| #### ZenML Pro Client Artifacts | ||
|
|
||
| If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located [in Docker Hub at `zenmldocker/zenml`](https://hub.docker.com/r/zenmldocker/zenml). This isn't a problem unless you're deploying ZenML Pro in an air-gapped environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the [DockerSettings documentation](https://docs.zenml.io/how-to/customize-docker-builds) for more information). | ||
| If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located [in Docker Hub at `zenmldocker/zenml`](https://hub.docker.com/r/zenmldocker/zenml). This isn't a problem unless you're deploying ZenML Pro in a Self-hosted environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the [DockerSettings documentation](https://docs.zenml.io/how-to/customize-docker-builds) for more information). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located [in Docker Hub at `zenmldocker/zenml`](https://hub.docker.com/r/zenmldocker/zenml). This isn't a problem unless you're deploying ZenML Pro in a Self-hosted environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the [DockerSettings documentation](https://docs.zenml.io/how-to/customize-docker-builds) for more information). | |
| If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located [in Docker Hub at `zenmldocker/zenml`](https://hub.docker.com/r/zenmldocker/zenml). This isn't a problem unless you're deploying ZenML Pro in an air-gapped environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the [DockerSettings documentation](https://docs.zenml.io/how-to/customize-docker-builds) for more information). |
The original text was actually correct here.
| Choose **Self-hosted** if you need complete control with no external dependencies. | ||
|
|
||
| **What runs where:** | ||
| - All components: [Your infrastructure](https://docs.zenml.io/stacks) (completely isolated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to point to stacks here.
|
|
||
| 1. **Code Execution**: You write code and run pipelines with your client SDK using Python | ||
|
|
||
| 2. **Token Acquisition**: The ZenML client fetches short-lived tokens from your ZenML workspace for: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this only happens if you use service connectors
|
|
||
| | Component | Location | Purpose | | ||
| |-----------|----------|---------| | ||
| | **ZenML Pro Server** | ZenML Infrastructure | Manages pipeline orchestration and metadata | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using the term "zenml server" in several places, but it appears in none of the diagrams. You should probably be using "zenml workspace".
|
|
||
| | Deployment Aspect | SaaS | Hybrid SaaS | Self-hosted | | ||
| |-------------------|------|-------------|------------| | ||
| | **ZenML Server** | ZenML infrastructure | Your infrastructure | Your infrastructure | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be workspace instead of server ?
|
|
||
| ### 🚀 Production Ready | ||
|
|
||
| - **High availability**: Built-in redundancy for critical components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect. I would say that the workspaces are the critical components, not the control plane or UI, and they are not under our control, so we cannot offer any such guarantees and shouldn't make such claims.
| - **High availability**: Built-in redundancy for critical components | ||
| - **Automatic updates**: Control plane maintained by ZenML | ||
| - **Professional support**: Direct access to ZenML experts | ||
| - **Monitoring included**: Health checks and alerting configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only partially correct. Health checks and alerting is only partial configured for the control plane. Workspaces are not covered.
|
|
||
| ### Artifact Store Access | ||
|
|
||
| The ZenML dashboard requires read access to your artifact store to display: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say UI (not dashboard) and it requires access to more than the artifact store (log store, orchestrators; see my other comment)
| - Artifact lineage graphs | ||
| - Step logs and outputs | ||
|
|
||
| You control this access by configuring appropriate cloud IAM permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is misleading. The truth is this: if you give your users permission to access these things, you also implicitly give the UI permission to do so. You could say "you control who can access this information in the UI by configuring appropriate ZenML Pro RBAC permissions. Cloud IAM permissions do not apply here.
| ```mermaid | ||
| graph LR | ||
| A[User] -->|1. Login| B[Control Plane<br/>ZenML Infrastructure] | ||
| B -->|2. Auth Token| A | ||
| A -->|3. Access Workspace| C[Workspace<br/>Your Infrastructure] | ||
| C -->|4. Validate Token| B | ||
| B -->|5. Authorization| C | ||
| C -->|6. Execute| D[Your Resources] | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these do not render correctly
stefannica
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to review as much of this as I could. Half of it is pretty good, while the other half is clearly vibe-written and riddled with hallucinations and over-simplifications.
I would kindly ask you to give this another careful read yourself, check that it's factually correct based on the original docs and resources, then correct the mistakes.
|
|
||
| | Data Type | Storage Location | Purpose | | ||
| |-----------|-----------------|---------| | ||
| | User credentials | Control Plane | Authentication only | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The control plane doesn't store user credentials (unless you count Personal Access Tokens or API keys). It's the customer's SSO/identity provider that stores the credentials.
| 1. User authenticates with ZenML control plane (SSO) | ||
| 2. Control plane issues authentication token | ||
| 3. User accesses workspace with token | ||
| 4. Workspace validates token with control plane | ||
| 5. Control plane confirms authorization (RBAC) | ||
| 6. Workspace executes operations on your infrastructure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit more complicated than this and the authentication flow varies depending on the type of authentication (web client, python client via web login flow, service account, PAT). If the point of this section is to provide a gross oversimplification of the authentication process, I think you nailed it. But you should probably mention that it's not 100% accurate (e.g. in most cases, the workspace issues its own temporary credentials, to avoid overloading the control plane by checking credentials for every API call).
I would recommend keeping details like the type of credentials being used (token). out of this oversimplified description (the diagram too):
| 1. User authenticates with ZenML control plane (SSO) | |
| 2. Control plane issues authentication token | |
| 3. User accesses workspace with token | |
| 4. Workspace validates token with control plane | |
| 5. Control plane confirms authorization (RBAC) | |
| 6. Workspace executes operations on your infrastructure | |
| 1. User authenticates with ZenML control plane (SSO) | |
| 2. Control plane issues authentication credentials | |
| 3. User accesses workspace with credentials | |
| 4. Workspace validates credentials with control plane | |
| 5. Control plane confirms authenticaiton and authorization (RBAC) | |
| 6. Workspace executes operations on your infrastructure |
| 1. **Clients authenticate** with ZenML Control Plane (SSO) - hosted by ZenML | ||
| 2. **Control Plane issues** RBAC-validated tokens to clients | ||
| 3. **Clients connect** to their assigned workspace(s) in your infrastructure | ||
| 4. **Workspaces validate** tokens with Control Plane (outbound-only connection) | ||
| 5. **Pipelines execute** on your infrastructure resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a re-iteration of the previous section. You should merge them.
| - `kubectl` configured to access your cluster | ||
| - `helm` CLI (3.0+) installed | ||
| - A domain name and TLS certificate for your ZenML server | ||
| - MySQL or PostgreSQL database (managed or self-hosted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - MySQL or PostgreSQL database (managed or self-hosted) | |
| - MySQL database (managed or self-hosted) |
| ## Step 3: Create Secrets for Credentials | ||
|
|
||
| Create a secret for your Pro OAuth2 credentials. Ask you ZenML Solutions Architect to send you this secret.: | ||
|
|
||
| ```bash | ||
| kubectl -n zenml-hybrid create secret generic zenml-pro-credentials \ | ||
| --from-literal=ZENML_SERVER_PRO_OAUTH2_CLIENT_SECRET=<your-client-secret> | ||
| ``` | ||
|
|
||
|
|
||
| If using a custom TLS certificate (self-signed or from a CA), create a secret: | ||
|
|
||
| ```bash | ||
| kubectl -n zenml-hybrid create secret tls zenml-tls \ | ||
| --cert=/path/to/tls.crt \ | ||
| --key=/path/to/tls.key | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect. This is handled by the helm chart. The user doesn't have to manually configure any secrets.
| 1. Navigate to `https://zenml.mycompany.com` in your browser | ||
| 2. You should be redirected to ZenML Cloud login | ||
| 3. Sign in with your organization credentials | ||
| 4. You should see your workspace listed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the backwards way of doing it. You should instruct them to log in to cloud.zenml.io, access their org and then their workspace. This redirect is more of a backwards compatibility failsafe than it is an official way of accessing the workspace UI.
| kubectl -n zenml-workload-manager create serviceaccount zenml-runner | ||
| ``` | ||
|
|
||
| ### 2. Configure Workload Manager in Helm Values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is repeated in at least 3 different places:
- here
- in the self-hosted docs
- in the workload managers section
Can you please just point to the workload managers section instead of duplicating this information ?
| external: | ||
| type: mysql | ||
| host: zenml-db.123456789.us-east-1.rds.amazonaws.com | ||
| port: 3306 | ||
| username: admin | ||
| password: <your-rds-password> | ||
| database: zenml_hybrid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hallucinated or ill-informed. Please consult the official helm chart values.
| # Add other environment variables as needed | ||
| ``` | ||
|
|
||
| ## Database Configuration Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire section is unnecessary. We already have a complete section on how to configure the helm chart for OSS server deployments. Duplicating oversimplified parts of that here - and incorrectly at that - isn't going to help anyone. Better to link to the correct and fully detailed OSS helm documentation here instead.
| external: | ||
| type: mysql | ||
| host: 34.123.45.67 | ||
| port: 3306 | ||
| username: root | ||
| password: <your-cloud-sql-password> | ||
| database: zenml_hybrid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hallucinated. I won't repeat this comment....
Describe changes
I added a section per deployment scenario - https://zenml-io.gitbook.io/alexej/zenml-pro
Pre-requisites
Please ensure you have done the following:
developand the open PR is targetingdevelop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes