-
Notifications
You must be signed in to change notification settings - Fork 179
Add blog post about different strategies for pulling images #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
manuelbuil
wants to merge
4
commits into
k3s-io:main
Choose a base branch
from
manuelbuil:blogImages
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
545c38e
Add blog post about pre-pulling images
manuelbuil d53c6d3
Address Derek's comments
manuelbuil b5da48f
Address Brad comments: stargz & online prepulling & one-line example
manuelbuil b8a0bbc
Slight modifications to make it better readable
manuelbuil File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| --- | ||
| title: K3s strategies for image consumption | ||
| description: Master online and offline image loading techniques in K3s for ultra-fast application startup, even with multi-gigabyte containers. | ||
| authors: manuelbuil | ||
| hide_table_of_contents: true | ||
| --- | ||
|
|
||
| Slow image pulls can be annoying and may increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation is exacerbated by new AI-driven apps, which often rely on astronomically large images, frequently tens of hundreds of gigabytes. This post dives into mechanisms that K3s makes available to improve the user's experience when handling large images. | ||
|
|
||
| <!-- truncate --> | ||
|
|
||
|
|
||
| ## Online & Offline Strategies: The Power of Local Import 📦 ## | ||
|
|
||
| K3s provides mechanisms for ensuring large images are available quickly, that address two common scenarios: | ||
| - Online Clusters: To avoid slow image pulls from an external registry when a pod starts, K3s can `pre-pull` images from a manifest file. | ||
| - Offline (Air-Gapped) Clusters: Where no external registry is available, K3s can `import` images directly from local tarball archives. | ||
|
|
||
| 1. Pre-Pulling Images via a Manifest File (Online) | ||
| In scenarios with internet connectivity, the goal is to initiate image pulls as early and efficiently as possible. K3s can be instructed to sequentially pull a set of images into the embedded containerd store during startup or while K3s is running. This is ideal for ensuring base images are ready the moment the cluster starts or the moment the application is deployed. However, if this process is done before the cluster is started, K3s won't successfully start until all images have been pulled, which could make K3s fail to start if it takes more than 15 minutes. If you suspect this is happening to you, you'd better do the pre-pulling while K3s is running. | ||
|
|
||
| Users can trigger a pull of images into the containerd image store by placing a simple text file containing the image names, one per line, in the `/var/lib/rancher/k3s/agent/images` directory. As we have just explained, this can be done before K3s starts or while K3s is running. For example, you can execute the following in one of the nodes: | ||
|
|
||
| ```bash | ||
| mkdir -p /var/lib/rancher/k3s/agent/images && echo docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime > /var/lib/rancher/k3s/agent/images/pytorch.txt | ||
| ``` | ||
| In the previous command, we have created the images directory on the node and dropped a file names `pytorch.txt` that contains the image: `docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime`. | ||
|
|
||
| The K3s process will then pull these images via the CRI API. You should see the following two logs: | ||
| ```log | ||
| # When the k3s controller detects the file | ||
| level=info msg="Pulling images from /var/lib/rancher/k3s/agent/images/example.txt" | ||
| level=info msg="Pulling image docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime" | ||
|
|
||
| # When the import is ready. It specifies how much time it took in ms: | ||
| level=info msg="Imported docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime" | ||
| level=info msg="Imported images from /var/lib/rancher/k3s/agent/images/example.txt in 6m1.178972902s" | ||
| ``` | ||
|
|
||
| 2. Importing Images from Tarballs (Offline & Ultra-Fast) | ||
|
|
||
| For the absolute fastest startup—critical or when being in an air-gapped environment, the images should be available locally as tarballs. K3s will load these images directly into the containerd image store, bypassing any network traffic entirely. | ||
|
|
||
| Place the image tarballs (created using docker save or ctr save) in the same /var/lib/rancher/k3s/agent/images directory. K3s will decompress the tarball, extract the image layers, and load them. | ||
|
|
||
| For example, I have created an image tarball with all the images required to deploy the popular [microservices-demo](https://github.com/GoogleCloudPlatform/microservices-demo) with the name `microservices-demo.tar.gz`. | ||
|
|
||
| ```bash | ||
| # Example: Save the image and place the tarball | ||
| mkdir -p /var/lib/rancher/k3s/agent/images | ||
| cp microservices-demo.tar.gz /var/lib/rancher/k3s/agent/images/ | ||
| ``` | ||
|
|
||
| The K3s process will load those images and you should see the following two logs: | ||
| ```log | ||
| level=info msg="Importing images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz" | ||
| level=info msg="Imported images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz in 1m39.8610592s | ||
| ``` | ||
|
|
||
| You can verify the successfully imported images at any time using the bundled client: `k3s ctr images list` | ||
|
|
||
| ### Optimizing booting times with tarballs ### | ||
|
|
||
| By default, image archives are imported every time K3s starts to ensure consistency. However, this delay can be significant when dealing with many large archives, for example, `microservices-demo.tar.gz` took 1m39s to import. To alleviate this, K3s offers a feature to only import tarballs that have changed since they were last processed. To enable this feature, create an empty `.cache.json` file in the images directory: | ||
|
|
||
| ```bash | ||
| touch /var/lib/rancher/k3s/agent/images/.cache.json | ||
| ``` | ||
|
|
||
| The cache file will store archive metadata (size and modification time). Subsequent restarts of K3s will check this file and skip the import process for any large tarballs that haven't changed, dramatically speeding up cluster boot time. Therefore, to check that this is working, check `.cache.json` is not empty and, after restarting, that the two log lines do not appear anymore. | ||
|
|
||
| Note that the caching mechanism needs to be enabled carefully. If an image was removed or pruned since last startup, take manual action to reimport the image. Check our [docs](https://docs.k3s.io/installation/airgap?_highlight=.cache.json&airgap-load-images=Manually+Deploy+Images#enable-conditional-image-imports) for more information. | ||
|
|
||
|
|
||
| ## Embedded Registry Mirror 🕸️ ## | ||
|
|
||
| K3s offers an in-cluster container image registry mirror by embedding [Spegel](https://spegel.dev/). Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by allowing nodes to pull cached image content directly from other nodes whenever possible, instead of requiring each node to reach out to a central registry. To enable this feature, server nodes must be started with the `--embedded-registry` flag, or with `embedded-registry: true` in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001. | ||
|
|
||
| ```bash | ||
| # Enable the embedded registry mirror | ||
| embedded-registry: true | ||
| # To enable metrics that can help with the embedded registry mirror | ||
| supervisor-metrics: true | ||
| ``` | ||
|
|
||
| And then, on all nodes, you must add a `registries.yaml` where we specified what registries to allow a node to both push and pull images with other nodes. If a registry is enabled for mirroring on some nodes, but not on others, only the nodes with the registry enabled will exchange images. For example: | ||
|
|
||
| ```yaml | ||
| mirrors: | ||
| docker.io: | ||
| registry.k8s.io: | ||
| ``` | ||
|
|
||
| If everything boots up correctly, you should see in the logs: | ||
| ```log | ||
| level=info msg="Starting distributed registry mirror at https://10.11.0.11:6443/v2 for registries [docker.io registry.k8s.io]" | ||
| level=info msg="Starting distributed registry P2P node at 10.11.0.11:5001" | ||
| ``` | ||
|
|
||
| And you should be able to see metrics of Spegel by querying the supervisor metrics server: | ||
| ```bash | ||
| kubectl get --server https://10.11.0.11:6443 --raw /metrics | grep spegel | ||
| ``` | ||
|
|
||
| For more information check the [docs](https://docs.k3s.io/installation/registry-mirror) | ||
|
|
||
| ## Bonus: eStargz images ⚡ ## | ||
|
|
||
| A different solution to speed up the creation of pods is by using a special image format called eStargz. This enables lazy pulling, which means that the application can start almost instantly while the rest of the image is pulled in the background. This strategy requires both the image to be specifically built in the eStargz format and the K3s agent to be configured to use the stargz snapshotter: `--snapshotter=estargz` flag, or with `snapshotter: estargz` in the configuration file. | ||
|
|
||
| This is currently an experimental feature in K3s and we have more information in the [advanced section of our docs](https://docs.k3s.io/advanced#enabling-lazy-pulling-of-estargz-experimental). We would love to hear your feedback if you are using it. | ||
|
|
||
| ## Conclusion 🏁 ## | ||
|
|
||
| K3s provides robust, flexible tools to tackle slow image pulls, a problem magnified by today's multi-gigabyte cloud-native and AI images. By leveraging pre-pulling manifest strategies, tarball loading or optimizing image distribution with the embedded [Spegel](https://spegel.dev/) registry mirror, you can shift slow network operations into quick local operations. These mechanisms ensure your resource-constrained and air-gapped clusters achieve rapid, predictable startup times, delivering a consistently better user experience. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.