From 545c38ef13e3a710fc4445e0c54087029cbfd542 Mon Sep 17 00:00:00 2001 From: manuelbuil Date: Mon, 3 Nov 2025 17:10:18 +0100 Subject: [PATCH 1/4] Add blog post about pre-pulling images Signed-off-by: manuelbuil --- .../2025-11-03-strategies-for-large-images.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 blog/2025-11-03-strategies-for-large-images.md diff --git a/blog/2025-11-03-strategies-for-large-images.md b/blog/2025-11-03-strategies-for-large-images.md new file mode 100644 index 000000000..8b21f303f --- /dev/null +++ b/blog/2025-11-03-strategies-for-large-images.md @@ -0,0 +1,114 @@ +--- +title: K3s strategies for image consumption +description: Master online and offline image loading techniques in K3s for ultra-fast application startup, even with multi-gigabyte containers. +authors: manuelbuil +hide_table_of_contents: true +--- + +Slow image pulls can be annoying and could increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation gets worsened by new AI cloud native apps, which often rely on astronomically large images (several gigabytes). This post dives into K3s mechanisms, like pre-pulling images and embedded registry mirror, that can effectively improve the user's experience ensuring your cluster is ready the moment you need the images, especially in environments where network bandwidth might be constrained. + + +## ๐Ÿ“ฆ Online & Offline Strategies: The Power of Local Import ## + +K3s provides two core mechanisms for ensuring large images are available quickly, whether you are connected to an external registry (online) or deploying in an isolated environment (offline). The goal is to shift the time spent waiting on a slow network pull into a fast local load during K3s startup. + +1. Pre-Pulling Images via a Manifest File (Online) +In scenarios with internet connectivity, the goal is to initiate image pulls as early and efficiently as possible. K3s can be instructed to sequentially pull a set of images into the embedded containerd store during startup. This is ideal for ensuring base images are ready the moment the cluster starts. + +Users can trigger a pull of images into the containerd image store by placing a simple text file containing the image names, one per line, in the /var/lib/rancher/k3s/agent/images directory. This can be done before K3s starts or while K3s is running. + +Imagine the file `example.txt` which contains: + +```yaml +docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime +``` +Before starting the k3s service in the node, do the following: + +```bash +# 1. Create the images directory on the node +mkdir -p /var/lib/rancher/k3s/agent/images + +# 2. Copy the manifest file (example.txt) +cp example.txt /var/lib/rancher/k3s/agent/images +``` +The K3s process will then pull these images via the CRI API. You should see the following two logs: +```yaml +# When the k3s controller detects the file +level=info msg="Pulling images from /var/lib/rancher/k3s/agent/images/example.txt" +level=info msg="Pulling image docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime" + +# When the import is ready. It specifies how much time it took in ms: +level=info msg="Imported docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime" +level=info msg="Imported images from /var/lib/rancher/k3s/agent/images/example.txt in 6m1.178972902s" +``` + +2. Importing Images from Tarballs (Offline & Ultra-Fast) + +For the absolute fastest startupโ€”critical or when being in an air-gapped environment, the images should be available locally as tarballs. K3s will load these images directly into the containerd image store, bypassing any network traffic entirely. + +Place the image tarballs (created using docker save or ctr save) in the same /var/lib/rancher/k3s/agent/images directory. K3s will decompress the tarball, extract the image layers, and load them. + +For example, I have created an image tarball with all the images required to deploy the popular [microservices-demo](https://github.com/GoogleCloudPlatform/microservices-demo) with the name `microservices-demo.tar.gz`. + +```bash +# Example: Save the image and place the tarball +mkdir -p /var/lib/rancher/k3s/agent/images +cp microservices-demo.tar.gz /var/lib/rancher/k3s/agent/images/ +``` + +The K3s process will load those images and you should see the following two logs: +```yaml +level=info msg="Importing images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz" +level=info msg="Imported images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz in 1m39.8610592s +``` + +You can verify the successfully imported images at any time using the bundled client: `k3s ctr images list` + +### Optimizing booting times with tarballs ### + +By default, image archives are imported every time K3s starts to ensure consistency. However, this delay can be significant when dealing with many large archives, for example, `microservices-demo.tar.gz` took 1m39s to import. To alleviate this, K3s offers a feature to only import tarballs that have changed since they were last processed. To enable this feature, create an empty `.cache.json` file in the images directory: + +```bash +touch /var/lib/rancher/k3s/agent/images/.cache.json +``` + +The cache file will store archive metadata (size and modification time). Subsequent restarts of K3s will check this file and skip the import process for any large tarballs that haven't changed, dramatically speeding up cluster boot time. Therefore, to check that this is working, check `.cache.json` is not empty and, after restarting, that the two log lines do not appear anymore. + +Note that the caching mechanism needs to be enabled carefully. If an image was removed or pruned since last startup, take manual action to reimport the image. Check our [docs](https://docs.k3s.io/installation/airgap?_highlight=.cache.json&airgap-load-images=Manually+Deploy+Images#enable-conditional-image-imports) for more information. + + +## ๐Ÿ•ธ๏ธ Embedded Registry Mirror ## + +K3s offers an in-cluster container image registry mirror by embedding Spegel. Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by ensuring images are pulled within the cluster network rather than repeatedly from a central registry. To enable this feature, server nodes must be started with the --embedded-registry flag, or with embedded-registry: true in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001. + +```bash +# Enable the embedded registry mirror +embedded-registry: true +# To enable metrics that can help with the embedded registry mirror +supervisor-metrics: true +``` + +And then, in all nodes, you must add a `registries.yaml` where we specified what registry we allow a node to both pull images from other nodes, and share the registry's images with other nodes. If a registry is enabled for mirroring on some nodes, but not on others, only the nodes with the registry enabled will exchange images from that registrywhat registries are mirrored. For example: + +```bash +mirrors: + docker.io: + registry.k8s.io: +``` + +If everything boots up correctly, you should see in the logs: +```yaml +level=info msg="Starting distributed registry mirror at https://10.11.0.11:6443/v2 for registries [docker.io registry.k8s.io]" +level=info msg="Starting distributed registry P2P node at 10.11.0.11:5001" +``` + +And you should be able to see metrics of Spegel by querying the supervisor metrics server: +```bash +kubectl get --server https://10.11.0.11:6443 --raw /metrics | grep spegel +``` + + +๐Ÿ Conclusion + + +K3s provides robust, flexible tools to decisively tackle slow image pulls, a problem magnified by today's multi-gigabyte cloud-native and AI images. By leveraging pre-pulling manifest strategies, tarball loading or optimizing image distribution with the embedded Spegel registry mirror, you can shift network latency into quick, reliable local operations. These mechanisms ensure your resource-constrained and air-gapped clusters achieve rapid, predictable startup times, delivering a consistently better user experience. \ No newline at end of file From d53c6d3f62f05c44d8d517b56209f6c67b2bb4fc Mon Sep 17 00:00:00 2001 From: manuelbuil Date: Thu, 6 Nov 2025 10:43:44 +0100 Subject: [PATCH 2/4] Address Derek's comments Signed-off-by: manuelbuil --- .../2025-11-03-strategies-for-large-images.md | 31 ++++++++++--------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/blog/2025-11-03-strategies-for-large-images.md b/blog/2025-11-03-strategies-for-large-images.md index 8b21f303f..1b767040b 100644 --- a/blog/2025-11-03-strategies-for-large-images.md +++ b/blog/2025-11-03-strategies-for-large-images.md @@ -5,12 +5,16 @@ authors: manuelbuil hide_table_of_contents: true --- -Slow image pulls can be annoying and could increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation gets worsened by new AI cloud native apps, which often rely on astronomically large images (several gigabytes). This post dives into K3s mechanisms, like pre-pulling images and embedded registry mirror, that can effectively improve the user's experience ensuring your cluster is ready the moment you need the images, especially in environments where network bandwidth might be constrained. +Slow image pulls can be annoying and can increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation gets worsened by new AI cloud native apps, which often rely on astronomically large images (several gigabytes). This post dives into K3s mechanisms, like pre-pulling images and the embedded registry mirror, that can effectively improve the user's experience when handling large images. + -## ๐Ÿ“ฆ Online & Offline Strategies: The Power of Local Import ## -K3s provides two core mechanisms for ensuring large images are available quickly, whether you are connected to an external registry (online) or deploying in an isolated environment (offline). The goal is to shift the time spent waiting on a slow network pull into a fast local load during K3s startup. +## Online & Offline Strategies: The Power of Local Import ๐Ÿ“ฆ ## + +K3s provides mechanisms for ensuring large images are available quickly, that address two common scenarios: +- Online Clusters: To avoid slow image pulls from an external registry when a pod starts, K3s can `pre-pull` images from a manifest file. +- Offline (Air-Gapped) Clusters: Where no external registry is available, K3s can `import` images directly from local tarball archives. 1. Pre-Pulling Images via a Manifest File (Online) In scenarios with internet connectivity, the goal is to initiate image pulls as early and efficiently as possible. K3s can be instructed to sequentially pull a set of images into the embedded containerd store during startup. This is ideal for ensuring base images are ready the moment the cluster starts. @@ -19,7 +23,7 @@ Users can trigger a pull of images into the containerd image store by placing a Imagine the file `example.txt` which contains: -```yaml +```text docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime ``` Before starting the k3s service in the node, do the following: @@ -32,7 +36,7 @@ mkdir -p /var/lib/rancher/k3s/agent/images cp example.txt /var/lib/rancher/k3s/agent/images ``` The K3s process will then pull these images via the CRI API. You should see the following two logs: -```yaml +```log # When the k3s controller detects the file level=info msg="Pulling images from /var/lib/rancher/k3s/agent/images/example.txt" level=info msg="Pulling image docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime" @@ -57,7 +61,7 @@ cp microservices-demo.tar.gz /var/lib/rancher/k3s/agent/images/ ``` The K3s process will load those images and you should see the following two logs: -```yaml +```log level=info msg="Importing images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz" level=info msg="Imported images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz in 1m39.8610592s ``` @@ -77,9 +81,9 @@ The cache file will store archive metadata (size and modification time). Subsequ Note that the caching mechanism needs to be enabled carefully. If an image was removed or pruned since last startup, take manual action to reimport the image. Check our [docs](https://docs.k3s.io/installation/airgap?_highlight=.cache.json&airgap-load-images=Manually+Deploy+Images#enable-conditional-image-imports) for more information. -## ๐Ÿ•ธ๏ธ Embedded Registry Mirror ## +## Embedded Registry Mirror ๐Ÿ•ธ๏ธ ## -K3s offers an in-cluster container image registry mirror by embedding Spegel. Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by ensuring images are pulled within the cluster network rather than repeatedly from a central registry. To enable this feature, server nodes must be started with the --embedded-registry flag, or with embedded-registry: true in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001. +K3s offers an in-cluster container image registry mirror by embedding [Spegel](https://spegel.dev/). Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by ensuring images are pulled from within the cluster network rather than repeatedly from a central registry. To enable this feature, server nodes must be started with the `--embedded-registry` flag, or with `embedded-registry: true` in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001. ```bash # Enable the embedded registry mirror @@ -88,16 +92,16 @@ embedded-registry: true supervisor-metrics: true ``` -And then, in all nodes, you must add a `registries.yaml` where we specified what registry we allow a node to both pull images from other nodes, and share the registry's images with other nodes. If a registry is enabled for mirroring on some nodes, but not on others, only the nodes with the registry enabled will exchange images from that registrywhat registries are mirrored. For example: +And then, on all nodes, you must add a `registries.yaml` where we specified what registries to allow a node to both push and pull images with other nodes. If a registry is enabled for mirroring on some nodes, but not on others, only the nodes with the registry enabled will exchange images. For example: -```bash +```yaml mirrors: docker.io: registry.k8s.io: ``` If everything boots up correctly, you should see in the logs: -```yaml +```log level=info msg="Starting distributed registry mirror at https://10.11.0.11:6443/v2 for registries [docker.io registry.k8s.io]" level=info msg="Starting distributed registry P2P node at 10.11.0.11:5001" ``` @@ -108,7 +112,6 @@ kubectl get --server https://10.11.0.11:6443 --raw /metrics | grep spegel ``` -๐Ÿ Conclusion - +## Conclusion ๐Ÿ ## -K3s provides robust, flexible tools to decisively tackle slow image pulls, a problem magnified by today's multi-gigabyte cloud-native and AI images. By leveraging pre-pulling manifest strategies, tarball loading or optimizing image distribution with the embedded Spegel registry mirror, you can shift network latency into quick, reliable local operations. These mechanisms ensure your resource-constrained and air-gapped clusters achieve rapid, predictable startup times, delivering a consistently better user experience. \ No newline at end of file +K3s provides robust, flexible tools to tackle slow image pulls, a problem magnified by today's multi-gigabyte cloud-native and AI images. By leveraging pre-pulling manifest strategies, tarball loading or optimizing image distribution with the embedded [Spegel](https://spegel.dev/) registry mirror, you can shift slow network operations into quick local operations. These mechanisms ensure your resource-constrained and air-gapped clusters achieve rapid, predictable startup times, delivering a consistently better user experience. \ No newline at end of file From b5da48f309461d7479fd3eafa6963b120c5d30cc Mon Sep 17 00:00:00 2001 From: manuelbuil Date: Tue, 11 Nov 2025 10:31:52 +0100 Subject: [PATCH 3/4] Address Brad comments: stargz & online prepulling & one-line example Signed-off-by: manuelbuil --- ...2025-11-11-strategies-for-large-images.md} | 23 ++++++++----------- 1 file changed, 9 insertions(+), 14 deletions(-) rename blog/{2025-11-03-strategies-for-large-images.md => 2025-11-11-strategies-for-large-images.md} (79%) diff --git a/blog/2025-11-03-strategies-for-large-images.md b/blog/2025-11-11-strategies-for-large-images.md similarity index 79% rename from blog/2025-11-03-strategies-for-large-images.md rename to blog/2025-11-11-strategies-for-large-images.md index 1b767040b..8f32bcbac 100644 --- a/blog/2025-11-03-strategies-for-large-images.md +++ b/blog/2025-11-11-strategies-for-large-images.md @@ -17,24 +17,15 @@ K3s provides mechanisms for ensuring large images are available quickly, that ad - Offline (Air-Gapped) Clusters: Where no external registry is available, K3s can `import` images directly from local tarball archives. 1. Pre-Pulling Images via a Manifest File (Online) -In scenarios with internet connectivity, the goal is to initiate image pulls as early and efficiently as possible. K3s can be instructed to sequentially pull a set of images into the embedded containerd store during startup. This is ideal for ensuring base images are ready the moment the cluster starts. +In scenarios with internet connectivity, the goal is to initiate image pulls as early and efficiently as possible. K3s can be instructed to sequentially pull a set of images into the embedded containerd store during startup or while K3s is running. This is ideal for ensuring base images are ready the moment the cluster starts or the moment the application is deployed. However, if this process is done before the cluster is started, K3s won't successfully start until all images have been pulled, which could make K3s fail to start if it takes more than 15 minutes. If you suspect this is happening to you, you'd better do the pre-pulling while K3s is running. -Users can trigger a pull of images into the containerd image store by placing a simple text file containing the image names, one per line, in the /var/lib/rancher/k3s/agent/images directory. This can be done before K3s starts or while K3s is running. - -Imagine the file `example.txt` which contains: - -```text -docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime -``` -Before starting the k3s service in the node, do the following: +Users can trigger a pull of images into the containerd image store by placing a simple text file containing the image names, one per line, in the `/var/lib/rancher/k3s/agent/images` directory. As we have just explained, this can be done before K3s starts or while K3s is running. For example, you can execute the following in one of the nodes: ```bash -# 1. Create the images directory on the node -mkdir -p /var/lib/rancher/k3s/agent/images - -# 2. Copy the manifest file (example.txt) -cp example.txt /var/lib/rancher/k3s/agent/images +mkdir -p /var/lib/rancher/k3s/agent/images && echo docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime > /var/lib/rancher/k3s/agent/images/pytorch.txt ``` +In the previous command, we have created the images directory on the node and dropped a file names `pytorch.txt` that contains the image: `docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime`. + The K3s process will then pull these images via the CRI API. You should see the following two logs: ```log # When the k3s controller detects the file @@ -110,7 +101,11 @@ And you should be able to see metrics of Spegel by querying the supervisor metri ```bash kubectl get --server https://10.11.0.11:6443 --raw /metrics | grep spegel ``` +## Bonus: eStargz images โšก ## + +A different solution to speed up the creation of pods is by using a special image format called eStargz. This enables lazy pulling, which means that the application can start almost instantly while the rest of the image is pulled in the background. This strategy requires both the image to be specifically built in the eStargz format and the K3s agent to be configured to use the stargz snapshotter: `--snapshotter=estargz` flag, or with `snapshotter: estargz` in the configuration file. +This is currently an experimental feature in K3s and we have more information in the [advance section of our docs](https://docs.k3s.io/advanced#enabling-lazy-pulling-of-estargz-experimental). We would love to hear your feedback if you are using it. ## Conclusion ๐Ÿ ## From b8a0bbc5f142e86870a4c7606b722d29a552c400 Mon Sep 17 00:00:00 2001 From: manuelbuil Date: Wed, 12 Nov 2025 12:42:48 +0100 Subject: [PATCH 4/4] Slight modifications to make it better readable Signed-off-by: manuelbuil --- blog/2025-11-11-strategies-for-large-images.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/blog/2025-11-11-strategies-for-large-images.md b/blog/2025-11-11-strategies-for-large-images.md index 8f32bcbac..5e75d518d 100644 --- a/blog/2025-11-11-strategies-for-large-images.md +++ b/blog/2025-11-11-strategies-for-large-images.md @@ -5,7 +5,7 @@ authors: manuelbuil hide_table_of_contents: true --- -Slow image pulls can be annoying and can increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation gets worsened by new AI cloud native apps, which often rely on astronomically large images (several gigabytes). This post dives into K3s mechanisms, like pre-pulling images and the embedded registry mirror, that can effectively improve the user's experience when handling large images. +Slow image pulls can be annoying and may increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation is exacerbated by new AI-driven apps, which often rely on astronomically large images, frequently tens of hundreds of gigabytes. This post dives into mechanisms that K3s makes available to improve the user's experience when handling large images. @@ -74,7 +74,7 @@ Note that the caching mechanism needs to be enabled carefully. If an image was r ## Embedded Registry Mirror ๐Ÿ•ธ๏ธ ## -K3s offers an in-cluster container image registry mirror by embedding [Spegel](https://spegel.dev/). Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by ensuring images are pulled from within the cluster network rather than repeatedly from a central registry. To enable this feature, server nodes must be started with the `--embedded-registry` flag, or with `embedded-registry: true` in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001. +K3s offers an in-cluster container image registry mirror by embedding [Spegel](https://spegel.dev/). Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by allowing nodes to pull cached image content directly from other nodes whenever possible, instead of requiring each node to reach out to a central registry. To enable this feature, server nodes must be started with the `--embedded-registry` flag, or with `embedded-registry: true` in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001. ```bash # Enable the embedded registry mirror @@ -101,11 +101,14 @@ And you should be able to see metrics of Spegel by querying the supervisor metri ```bash kubectl get --server https://10.11.0.11:6443 --raw /metrics | grep spegel ``` + +For more information check the [docs](https://docs.k3s.io/installation/registry-mirror) + ## Bonus: eStargz images โšก ## A different solution to speed up the creation of pods is by using a special image format called eStargz. This enables lazy pulling, which means that the application can start almost instantly while the rest of the image is pulled in the background. This strategy requires both the image to be specifically built in the eStargz format and the K3s agent to be configured to use the stargz snapshotter: `--snapshotter=estargz` flag, or with `snapshotter: estargz` in the configuration file. -This is currently an experimental feature in K3s and we have more information in the [advance section of our docs](https://docs.k3s.io/advanced#enabling-lazy-pulling-of-estargz-experimental). We would love to hear your feedback if you are using it. +This is currently an experimental feature in K3s and we have more information in the [advanced section of our docs](https://docs.k3s.io/advanced#enabling-lazy-pulling-of-estargz-experimental). We would love to hear your feedback if you are using it. ## Conclusion ๐Ÿ ##