-
Notifications
You must be signed in to change notification settings - Fork 122
NO-JIRA: chore(docs/adr): document decision to implement Dockerfile templating to reduce duplication #2643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
NO-JIRA: chore(docs/adr): document decision to implement Dockerfile templating to reduce duplication #2643
Changes from all commits
ae15bc6
e3425bf
796ad35
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| # 5. Implement Dockerfile templating to remove duplication | ||
|
|
||
| Date: 2025-11-04 | ||
|
|
||
| ## Status | ||
|
|
||
| Accepted | ||
|
|
||
| ## Context | ||
| The project contains multiple Dockerfiles/Containerfiles with significant duplication across different variants (CPU/GPU, different Python versions, different base distributions). | ||
| Maintaining these duplicated files is error-prone, as changes need to be manually propagated across multiple files, increasing the risk of inconsistencies, merge conflicts, and bugs. | ||
|
|
||
| We agreed to implement a templating solution to reduce duplication. | ||
|
|
||
| We are aware that introducing abstractions may make the project harder to understand for newcomers, and inappropriate abstractions would make it more difficult to make changes that used to be easy. | ||
| Therefore, we're committed to maintaining clarity and debuggability of the build process and to keeping the abstraction malleable so that it can be adjusted to future needs as the project evolves. | ||
|
|
||
| ## Decision | ||
| We are already using the standard `${VARIABLE}` substitution via `podman build --build-arg-file`. | ||
|
|
||
| Implement Dockerfile/Containerfile templating using a phased approach: | ||
|
|
||
| ### Phase 1: Containerfile.in with Podman/CPP | ||
|
|
||
| Begin using Podman's native `Containerfile.in` preprocessing feature (available since Podman 2.2.1). | ||
|
|
||
| Containerfiles with `.in` suffix are automatically preprocessed via CPP(1), enabling: | ||
| - **C preprocessor directives**: `#include`, `#define`, `#ifdef`, `#if`, `#else`, `#endif`, etc. | ||
| - **File decomposition**: Break Containerfiles into reusable parts via `#include` directive | ||
| - **Manual preprocessing**: Files can be preprocessed outside Podman via `cpp -E Containerfile.in` | ||
| - **Already in use**: We currently use this feature in the project | ||
|
|
||
| Documentation: https://docs.podman.io/en/v2.2.1/markdown/podman-build.1.html | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wrap bare URLs in markdown link syntax Bare URLs should be formatted as markdown links for consistency and better rendering. Lines 33, 35, and 42 contain bare URLs that should be wrapped. Apply these diffs to format the URLs as markdown links: - Documentation: https://docs.podman.io/en/v2.2.1/markdown/podman-build.1.html
+ Documentation: [Podman build documentation](https://docs.podman.io/en/v2.2.1/markdown/podman-build.1.html)- Note: Historical issues with comment handling (https://github.com/containers/buildah/issues/3229) were resolved in https://github.com/containers/buildah/pull/3241
+ Note: Historical issues with comment handling ([buildah#3229](https://github.com/containers/buildah/issues/3229)) were resolved in [buildah#3241](https://github.com/containers/buildah/pull/3241)- 2. **stencils** (https://github.com/devcoons/stencils): Lightweight alternative, less mature
+ 2. **stencils** ([github.com/devcoons/stencils](https://github.com/devcoons/stencils)): Lightweight alternative, less matureAlso applies to: 35-35, 42-42 🧰 Tools🪛 markdownlint-cli2 (0.18.1)33-33: Bare URL used (MD034, no-bare-urls) 🤖 Prompt for AI Agents |
||
|
|
||
| Note: Historical issues with comment handling (https://github.com/containers/buildah/issues/3229) were resolved in https://github.com/containers/buildah/pull/3241 | ||
|
|
||
| ### Phase 2: Python-based Templating | ||
| When Containerfile.in proves insufficient for complex templating needs, adopt a Python-based solution: | ||
|
|
||
| Options evaluated: | ||
| 1. **Python f-strings**: Simple, native, no dependencies, good for basic interpolation | ||
| 2. **stencils** (https://github.com/devcoons/stencils): Lightweight alternative, less mature | ||
| 3. **Jinja2**: Industry-standard, powerful control structures (loops, conditionals), extensive ecosystem | ||
|
|
||
| Recommendation: Jinja2 for Phase 2 due to widespread adoption, maturity, and powerful templating capabilities. | ||
|
|
||
| ### Generated Code Management Decision | ||
| Must decide whether to: | ||
| - **Option A: Commit generated Containerfiles to Git** | ||
| - Pros: Reproducible builds, easier debugging, clear history of what was built | ||
| - Cons: Potential for drift between templates and committed files, larger repo | ||
|
|
||
| - **Option B: Generate on-the-fly during build** | ||
| - Pros: Single source of truth, impossible for drift, cleaner repo | ||
| - Cons: Requires templating tooling in CI/CD, harder to debug build failures | ||
|
|
||
| Initial recommendation: Commit generated files (Option A) for traceability and easier debugging, with CI checks to ensure templates and generated files stay in sync. | ||
|
|
||
| ## Consequences | ||
|
|
||
| ### Positive | ||
| - Reduced duplication across Containerfiles | ||
| - Easier maintenance: changes to common patterns apply across all variants | ||
| - Lower risk of inconsistencies between variants | ||
| - Improved consistency in build configuration | ||
| - Clearer separation between variant-specific and common configuration | ||
|
|
||
| ### Negative/Risks | ||
| - Additional complexity in the build process | ||
| - Learning curve for contributors unfamiliar with the templating system | ||
| - Risk of template bugs affecting multiple Containerfiles simultaneously | ||
| - Need to maintain tooling/scripts for template generation | ||
| - If committing generated files: need CI validation to catch drift | ||
| - If generating on-the-fly: build failures harder to debug without seeing the final Containerfile | ||
|
|
||
| ### Mitigations | ||
| - Start with the simplest solution (Containerfile.in) and only move to more complex templating as needed | ||
| - Document templating approach clearly in the repository | ||
| - Implement CI checks to validate generated Containerfiles | ||
| - Provide clear error messages in templating scripts | ||
| - Keep templates as simple and readable as possible | ||
| - Consider a hybrid approach: commit generated files but validate against templates in CI | ||
|
|
||
| ### Previous work | ||
|
|
||
| - <https://gitlab.cee.redhat.com/astonebe/notebook-utils> | ||
| - [RHOAIENG-16969 Remove specific instances of code duplication in odh/notebooks](https://issues.redhat.com/browse/RHOAIENG-16969) | ||
| - [RHOAIENG-19047 Deduplicate files that get included in workbench images](https://issues.redhat.com/browse/RHOAIENG-19047) | ||
| - [RHOAIENG-19046 Remove reliance on "chained builds" in notebooks repo](https://issues.redhat.com/browse/RHOAIENG-19046) | ||
| - [fix(makefile): standardized image targets #1015](https://github.com/opendatahub-io/notebooks/pull/1015) | ||
| - [RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks #834](https://github.com/opendatahub-io/notebooks/pull/834) | ||
| - <https://redhat-internal.slack.com/archives/C060A5FJEAD/p1738786041914139> | ||
| - [📦 Consolidate duplicate bootstrapper implementations across Python 3.12 runtime environments #1349](https://github.com/opendatahub-io/notebooks/issues/1349) | ||
| - [improve and simplify docker multistage build in jupyter/datascience/ubi9-python-3.12/Dockerfile.cpu #2467](https://github.com/opendatahub-io/notebooks/issues/2467) | ||
| - [Refactor notebooks-release workflow to eliminate code duplication using reusable workflows #1185](https://github.com/opendatahub-io/notebooks/issues/1185) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeRabbit This is actually super problematic. Consider https://github.com/opendatahub-io/notebooks/blob/11a6ba8118160961fee65d5986b28344f2d8a28f/jupyter/datascience/ubi9-python-3.12/Dockerfile.cpu which contains something like
The trouble is comments. Getting the commented out lines right to pass through
cpp -Eis super messy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Length of output: 8788
🌐 Web query:
💡 Result:
Yes — Buildah fixed CPP comment handling for Containerfile.in: the v1.21.0 release notes state "Containerfiles no longer fail during CPP processing if a comment is contained within the file." [1]. For context, older docs/manpages note that comments in included Containerfile(.in) files were printed as warnings during builds. [2]
Sources:
[1] Buildah v1.21.0 release notes. [2] buildah build/bud man page (notes about Containerfile.in and CPP comment warnings).
@jiridanek You've identified a critical limitation with the Phase 1 recommendation. While the buildah issue was technically "resolved" in v1.21.0 (files no longer crash during CPP processing), your example demonstrates that CPP's fundamental design—treating
#as a preprocessor directive—creates practical problems with:#######...)The conflict is inherent: CPP and Dockerfiles both use
#, making it extremely difficult to distinguish between "this is a preprocessor directive" and "this is a Docker/shell comment."Issue with the ADR: The document states "Already in use: We currently use this feature in the project," but my search found no existing
.infiles in the repository. This claim should be corrected or removed.Recommendation: Given your real-world example, I suggest either:
Skip Phase 1 entirely for complex Dockerfiles with extensive comments, heredocs, and shell scripts. Go directly to Phase 2 (Python-based templating with Jinja2), which won't have these comment conflicts.
Add a prominent warning to the Phase 1 section about the comment handling limitations and specify that it's only suitable for simple cases without heredocs or complex inline comments.
Reserve Containerfile.in for trivial cases like
#includedirectives for shared fragments, where you can carefully control comment placement.Would you like to revise the ADR to reflect these limitations, or would you prefer to start directly with Jinja2 templating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional problem, the bug with comments was "solved" by printing a warning for every # comment that is not a directive; it causes extremely messy logs https://rick-rainer-ludwig.com/2025/03/10/efficient-container-creation-with-make-and-c-preprocessor/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai propose sample code to illustrate jinja2 templating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiridanek Here's a comprehensive example of Jinja2 templating for your Dockerfiles:
1. Template file:
Containerfile.j22. Configuration file:
configs/datascience-gpu-py312.yaml3. Configuration file:
configs/datascience-cpu-py311.yaml4. Rendering script:
tools/render_containerfiles.py5. Makefile integration:
Makefile6. CI validation:
.github/workflows/validate-templates.ymlKey advantages demonstrated:
StrictUndefinedcatches typos in variable names#commentsThis approach lets you maintain one template and generate multiple variants (CPU/GPU, Python 3.11/3.12, etc.) without the issues you encountered with CPP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai
podman machine ssh sudo dnf install -y cpp)macOS
Install
cppusingIf the above command fails due to lack of disk space, you can try to free up some space by running:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiridanek Thanks for documenting the macOS setup instructions! These are helpful for anyone who wants to experiment with the Containerfile.in approach.
Suggested placement:
I'd recommend adding these instructions to a developer setup guide (e.g.,
docs/development/setup.mdorCONTRIBUTING.md) rather than in the ADR itself, since the ADR focuses on architectural decisions rather than platform-specific tooling setup.Enhanced version with context:
Troubleshooting: If the above command fails due to lack of disk space, free up space first:
Then retry the cpp installation.
Note: Due to CPP's limitations with handling comments in Dockerfiles (especially in heredocs and complex scripts), this approach is recommended only for simple use cases. For production Dockerfiles, consider using Jinja2 templating (Phase 2) instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai very well, I won't do Dockerfile.in templating, instead I'll start by turning inline bash to HEREDOC, and then extract it to separate files and deduplicating, as first step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Length of output: 1770
@jiridanek That's an excellent pragmatic decision! 👍
Your phased approach makes a lot of sense:
This avoids the CPP comment-handling pitfalls entirely while achieving the same goals of reducing duplication and improving maintainability.
Recommendation for the ADR:
Since you're skipping Phase 1, I suggest updating the ADR in this PR to reflect your actual implementation plan:
This way, the ADR will accurately document the decision-making process and help future contributors understand why CPP was avoided.
Would you like me to suggest specific wording for the ADR updates?
✏️ Learnings added