Much better Pro docs #4263

AlexejPenner · 2025-11-28T10:30:00Z

Describe changes

I added a section per deployment scenario - https://zenml-io.gitbook.io/alexej/zenml-pro

Pre-requisites

Please ensure you have done the following:

I have read the CONTRIBUTING.md document.
I have added tests to cover my changes.
I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
IMPORTANT: I made sure that my changes are reflected properly in the following resources:
- ZenML Docs
- Dashboard: Needs to be communicated to the frontend team.
- Templates: Might need adjustments (that are not reflected in the template tests) in case of non-breaking changes and deprecations.
- Projects: Depending on the version dependencies, different projects might get affected.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Other (add details above)

github-actions · 2025-11-28T10:45:49Z

Documentation Link Check Results

❌ Absolute links check failed
There are broken absolute links in the documentation. See workflow logs for details
❌ Relative links check failed
There are broken relative links in the documentation. See workflow logs for details
_{Last checked: 2025-12-05 10:01:47 UTC}

github-actions · 2025-11-28T10:49:01Z

🔍 Broken Links Report

Summary

📁 Files with broken links: 3
🔗 Total broken links: 3
📄 Broken markdown links: 2
🖼️ Broken image links: 1
⚠️ Broken reference placeholders: 0

Details

File	Link Type	Link Text	Broken Path
`zenml-pro/hybrid-deployment-helm.md`	📄	"Set up users and teams"	`../organization.md`
`zenml-pro/self-hosted-deployment.md`	🖼️	"Self-hosted deployment architecture"	`../../.gitbook/assets/air-gapped-architecture.png`
`zenml-pro/hybrid-deployment-ecs.md`	📄	"Set up users and teams"	`../organization.md`

📂 Full file paths

/home/runner/work/zenml/zenml/scripts/../docs/book/getting-started/zenml-pro/hybrid-deployment-helm.md
/home/runner/work/zenml/zenml/scripts/../docs/book/getting-started/zenml-pro/self-hosted-deployment.md
/home/runner/work/zenml/zenml/scripts/../docs/book/getting-started/zenml-pro/hybrid-deployment-ecs.md

AlexejPenner · 2025-11-28T11:22:57Z

docs/book/getting-started/zenml-pro/on-prem-deployment.md

+1. **Code Execution**: You write code and run pipelines with your client SDK using Python
+2. **Authentication & Token Acquisition**:
+   - Users authenticate via your internal identity provider (LDAP/AD/OIDC)
+   - The ZenML Pro control plane (running in your infrastructure) handles authentication and RBAC
+   - The ZenML client fetches short-lived tokens from your ZenML workspace for:
+     - Pushing Docker images to your container registry
+     - Communicating with your artifact store
+     - Submitting workloads to your orchestrator
+   - *Note: Your local Python environment needs the client libraries for your stack components*
+3. **Authorization**: RBAC policies enforced by your control plane before token issuance
+4. **Image & Workload Submission**: The client pushes Docker images (and optionally code if no code repository is configured) to your container registry, then submits the workload to your orchestrator
+5. **Orchestrator Execution**: In the orchestrator environment within your infrastructure:
+   - The Docker image is pulled from your container registry
+   - Within the pipeline/step entrypoint, the necessary code is pulled in
+   - A connection to your ZenML workspace is established
+   - The relevant pipeline/step code is executed
+6. **Runtime Data Flow**: During execution (all within your infrastructure):
+   - Pipeline and step run metadata is logged to your ZenML workspace
+   - Logs are streamed to your log backend
+   - Artifacts are written to your artifact store
+   - Metadata pointing to these artifacts is persisted in your workspace
+7. **Observability**: The ZenML Pro dashboard (running in your infrastructure) connects to your workspace and uses all persisted metadata to provide you with a complete observability plane


@stefannica fact check pls

AlexejPenner · 2025-11-28T11:23:33Z

docs/book/getting-started/zenml-pro/self-hosted-deployment.md

+
+The diagram above illustrates a complete air-gapped ZenML Pro deployment with all components running within your organization's VPC. This architecture ensures zero external communication while providing full enterprise MLOps capabilities.
+
+### Architecture Components


@stefannica fact check pls

AlexejPenner · 2025-11-28T11:23:59Z

docs/book/getting-started/zenml-pro/self-hosted-deployment.md

+- **Backup sites** for disaster recovery
+- **Monitoring and alerting** for all components
+
+## Pre-requisites


@stefannica fact check pls

AlexejPenner · 2025-12-03T09:21:50Z

https://zenml-io.gitbook.io/alexej/zenml-pro - view here to see it in action

htahir1

I think its good for a first round. many comments apply to many pages

docs/book/getting-started/zenml-pro/saas-deployment.md

docs/book/getting-started/zenml-pro/README.md

htahir1 · 2025-12-03T09:34:05Z

docs/book/getting-started/zenml-pro/README.md

 - ✅ **Vulnerability Assessment Reports** available on request
 - ✅ **Software Bill of Materials (SBOM)** available on request


@stefannica should verify this

yes, we can provide this on request

htahir1 · 2025-12-03T09:44:02Z

docs/book/getting-started/zenml-pro/deployments-overview.md

+
+All three deployment scenarios follow a similar pipeline execution pattern, with differences in where authentication happens and where data resides:
+
+### Standard Data Flow Steps


This definitely needs a diagram

Agreed - we might even have one laying around somewhere

htahir1 · 2025-12-03T09:44:41Z

docs/book/getting-started/zenml-pro/deployments-overview.md

+
+**SaaS**: Metadata is stored in ZenML infrastructure. Your ML data and compute remain in your infrastructure.
+
+**Hybrid**: Metadata and control plane are split — authentication/RBAC happens at ZenML control plane, but all run metadata, artifacts, and compute stay in your infrastructure.


I thnk the authentication bit is the most important here and isnt really elaborated but maybe it is later?

What more would you like to know about this at this stage?

htahir1 · 2025-12-03T09:46:20Z

docs/book/getting-started/zenml-pro/saas-deployment.md

+
+You control this access by configuring appropriate cloud IAM permissions.
+
+## Getting Started


IMO super strnage to have this whole section here...

the whole section ? maybe we dont need the example pipeline - butt i like how it shows how quickly youi're ready

Hmm really? its in the dashboard already when you sign up

well somebody in the docs here wants to know what complexity awaits them - "Is it worth my time?"

im not sure tbh

in my experience these are the questions we get very early on

docs/book/getting-started/zenml-pro/saas-deployment.md

Co-authored-by: Hamza Tahir <hamza@zenml.io>

… docs/better-pro-docs

github-actions · 2025-12-03T14:33:12Z

Images automagically compressed by Calibre's image-actions ✨

Compression reduced images by 34%, saving 132.21 KB.

Filename	Before	After	Improvement	Visual comparison
`docs/book/getting-started/zenml-pro/.gitbook/assets/pro-workload-managers.png`	388.76 KB	256.55 KB	-34.0%	View diff

383 images did not require optimisation.

Update required: Update image-actions configuration to the latest version before 1/1/21. See README for instructions.

stefannica

first round of reviews, more to follow...

stefannica · 2025-12-05T22:02:20Z

docs/book/getting-started/zenml-pro/self-hosted.md

 #### ZenML Pro Client Artifacts

-If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located [in Docker Hub at `zenmldocker/zenml`](https://hub.docker.com/r/zenmldocker/zenml). This isn't a problem unless you're deploying ZenML Pro in an air-gapped environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the [DockerSettings documentation](https://docs.zenml.io/how-to/customize-docker-builds) for more information).
+If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located [in Docker Hub at `zenmldocker/zenml`](https://hub.docker.com/r/zenmldocker/zenml). This isn't a problem unless you're deploying ZenML Pro in a Self-hosted environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the [DockerSettings documentation](https://docs.zenml.io/how-to/customize-docker-builds) for more information).


Suggested change

If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located [in Docker Hub at `zenmldocker/zenml`](https://hub.docker.com/r/zenmldocker/zenml). This isn't a problem unless you're deploying ZenML Pro in a Self-hosted environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the [DockerSettings documentation](https://docs.zenml.io/how-to/customize-docker-builds) for more information).

If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located [in Docker Hub at `zenmldocker/zenml`](https://hub.docker.com/r/zenmldocker/zenml). This isn't a problem unless you're deploying ZenML Pro in an air-gapped environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the [DockerSettings documentation](https://docs.zenml.io/how-to/customize-docker-builds) for more information).

The original text was actually correct here.

stefannica · 2025-12-06T21:18:50Z

docs/book/getting-started/zenml-pro/deployments-overview.md

+Choose **Self-hosted** if you need complete control with no external dependencies.
+
+**What runs where:**
+- All components: [Your infrastructure](https://docs.zenml.io/stacks) (completely isolated)


It doesn't make sense to point to stacks here.

stefannica · 2025-12-06T21:20:39Z

docs/book/getting-started/zenml-pro/deployments-overview.md

+
+1. **Code Execution**: You write code and run pipelines with your client SDK using Python
+
+2. **Token Acquisition**: The ZenML client fetches short-lived tokens from your ZenML workspace for:


note: this only happens if you use service connectors

stefannica · 2025-12-06T21:29:25Z

docs/book/getting-started/zenml-pro/saas-deployment.md

+
+| Component | Location | Purpose |
+|-----------|----------|---------|
+| **ZenML Pro Server** | ZenML Infrastructure | Manages pipeline orchestration and metadata |


You are using the term "zenml server" in several places, but it appears in none of the diagrams. You should probably be using "zenml workspace".

stefannica · 2025-12-06T21:34:02Z

docs/book/getting-started/zenml-pro/deployments-overview.md

+
+| Deployment Aspect | SaaS | Hybrid SaaS | Self-hosted |
+|-------------------|------|-------------|------------|
+| **ZenML Server** | ZenML infrastructure | Your infrastructure | Your infrastructure |


Should this be workspace instead of server ?

stefannica · 2025-12-06T22:05:42Z

docs/book/getting-started/zenml-pro/hybrid-deployment.md

+
+### 🚀 Production Ready
+
+- **High availability**: Built-in redundancy for critical components


Incorrect. I would say that the workspaces are the critical components, not the control plane or UI, and they are not under our control, so we cannot offer any such guarantees and shouldn't make such claims.

stefannica · 2025-12-06T22:06:39Z

docs/book/getting-started/zenml-pro/hybrid-deployment.md

+- **High availability**: Built-in redundancy for critical components
+- **Automatic updates**: Control plane maintained by ZenML
+- **Professional support**: Direct access to ZenML experts
+- **Monitoring included**: Health checks and alerting configured


Only partially correct. Health checks and alerting is only partial configured for the control plane. Workspaces are not covered.

stefannica · 2025-12-06T22:10:01Z

docs/book/getting-started/zenml-pro/saas-deployment.md

+
+### Artifact Store Access
+
+The ZenML dashboard requires read access to your artifact store to display:


This should say UI (not dashboard) and it requires access to more than the artifact store (log store, orchestrators; see my other comment)

stefannica · 2025-12-06T22:12:32Z

docs/book/getting-started/zenml-pro/saas-deployment.md

+- Artifact lineage graphs
+- Step logs and outputs
+
+You control this access by configuring appropriate cloud IAM permissions.


This is misleading. The truth is this: if you give your users permission to access these things, you also implicitly give the UI permission to do so. You could say "you control who can access this information in the UI by configuring appropriate ZenML Pro RBAC permissions. Cloud IAM permissions do not apply here.

stefannica · 2025-12-06T22:15:01Z

docs/book/getting-started/zenml-pro/hybrid-deployment.md

+```mermaid
+graph LR
+    A[User] -->|1. Login| B[Control Plane<br/>ZenML Infrastructure]
+    B -->|2. Auth Token| A
+    A -->|3. Access Workspace| C[Workspace<br/>Your Infrastructure]
+    C -->|4. Validate Token| B
+    B -->|5. Authorization| C
+    C -->|6. Execute| D[Your Resources]
+```


these do not render correctly

stefannica

I tried to review as much of this as I could. Half of it is pretty good, while the other half is clearly vibe-written and riddled with hallucinations and over-simplifications.

I would kindly ask you to give this another careful read yourself, check that it's factually correct based on the original docs and resources, then correct the mistakes.

stefannica · 2025-12-07T18:03:56Z

docs/book/getting-started/zenml-pro/hybrid-deployment.md

+
+| Data Type | Storage Location | Purpose |
+|-----------|-----------------|---------|
+| User credentials | Control Plane | Authentication only |


The control plane doesn't store user credentials (unless you count Personal Access Tokens or API keys). It's the customer's SSO/identity provider that stores the credentials.

stefannica · 2025-12-07T18:18:13Z

docs/book/getting-started/zenml-pro/hybrid-deployment.md

+1. User authenticates with ZenML control plane (SSO)
+2. Control plane issues authentication token
+3. User accesses workspace with token
+4. Workspace validates token with control plane
+5. Control plane confirms authorization (RBAC)
+6. Workspace executes operations on your infrastructure


It's a bit more complicated than this and the authentication flow varies depending on the type of authentication (web client, python client via web login flow, service account, PAT). If the point of this section is to provide a gross oversimplification of the authentication process, I think you nailed it. But you should probably mention that it's not 100% accurate (e.g. in most cases, the workspace issues its own temporary credentials, to avoid overloading the control plane by checking credentials for every API call).

I would recommend keeping details like the type of credentials being used (token). out of this oversimplified description (the diagram too):

Suggested change

1. User authenticates with ZenML control plane (SSO)

2. Control plane issues authentication token

3. User accesses workspace with token

4. Workspace validates token with control plane

5. Control plane confirms authorization (RBAC)

6. Workspace executes operations on your infrastructure

1. User authenticates with ZenML control plane (SSO)

2. Control plane issues authentication credentials

3. User accesses workspace with credentials

4. Workspace validates credentials with control plane

5. Control plane confirms authenticaiton and authorization (RBAC)

6. Workspace executes operations on your infrastructure

stefannica · 2025-12-07T18:49:33Z

docs/book/getting-started/zenml-pro/hybrid-deployment.md

+1. **Clients authenticate** with ZenML Control Plane (SSO) - hosted by ZenML
+2. **Control Plane issues** RBAC-validated tokens to clients
+3. **Clients connect** to their assigned workspace(s) in your infrastructure
+4. **Workspaces validate** tokens with Control Plane (outbound-only connection)
+5. **Pipelines execute** on your infrastructure resources


This is a re-iteration of the previous section. You should merge them.

stefannica · 2025-12-07T19:23:46Z