Skip to content

Conversation

@vitorfloriano
Copy link
Contributor

@vitorfloriano vitorfloriano commented Nov 11, 2025

✨ Summary

This PR introduces the initial Golden Snapshot Patches (GSP) proposal β€” a lightweight, auditable documentation generation workflow for the Kubebuilder book.

Instead of embedding prose in Go files and maintaining complex extraction logic, GSP keeps:

  • Tutorial prose in Markdown (authored with mdBook includes)
  • Runnables in generated samples
  • Tutorial-specific edits as .patch overlays

Snapshot tests then compare rendered Markdown output against checked-in golden files, producing human-readable diffs that catch unintended drift or confirm intentional changes.

🧩 Key Additions

New design proposal: Golden Snapshot Patches

Implements:

  • Documentation generation redesign
  • Patch overlay mechanism for tutorial customization
  • Snapshot testing workflow for rendered docs

Adds a complete example walkthrough (getting-started flow) showing:

  • Markdown with {{#include}}
  • Patch overlay application
  • Regeneration, render, and snapshot validation steps

πŸ’‘ Motivation

The current literate extraction workflow is complex, brittle, and difficult to review:

  • Prose embedded in Go strings causes formatting errors and poor DX
  • Hidden Insert/Replace logic obscures tutorial intent
  • Drift between samples and docs often goes unnoticed

GSP fixes these pain points by separating concerns and validating the actual rendered output, making doc generation predictable, reviewable, and incremental.

βœ… Benefits

  • Markdown-first authoring: write tutorials directly in .md
  • Explicit sample customization: small, human-readable .patch diffs
  • Runnable, testable samples: snippets come from real projects
  • Auditable outputs: snapshot diffs surface all user-visible changes
  • Low infrastructure overhead: built entirely on git diff, patch, mdBook, and go test

🧭 Next Steps

  • Pilot implementation on one tutorial (e.g. getting-started)
  • Add patch application step to generate_samples.go
  • Integrate snapshot testing in CI
  • Migrate tutorials incrementally and deprecate literate.go

πŸ”— Related work

Past discussions on how to keep the documentation updated did not take into account mdBook native include and Markdown ouput features. See the discussion in #2510 and related PRs.

❓ Objections

If we use {{#include file:lines}}, won't we have to change the docs everytime the code changes?

Answer: Yes, that's how we keep the docs updated. We detect changes and verify if they are code-drifts (something else moved and we update the lines extracted) or intentional changes (what we want to extract actually changed and we update the golden). I don't think we should worry about it being too often. Currently, {{#literatego}} is being used in roughly 20 files.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vitorfloriano
Once this PR has been reviewed and has the lgtm label, please assign varshaprasad96 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 11, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @vitorfloriano. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 11, 2025
**Scenario**: You (contributor/maintainer/docs writer) add a new code snippet to the getting-started tutorial that should come from the runnable sample project, but you also need a small customization in the sample for the example to read well.

### 1. Authoring the Markdown tutorial page
- File: docs/book/src/getting-started/intro.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitorfloriano
I think in this case it’s better to just go ahead and make the changes directly β€” we don’t need a proposal.

For example, you can take the β€œgetting started” one and replace the usage sections like this:

{{#include testdata/project/api/v1/cronjob_types.go:42:58}}

Then, remove the markers from the hack/docs generation.

If by the end the tutorial is updated accordingly and everything looks good, we can go ahead and merge it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vitorfloriano πŸ‘‹

{{#include testdata/project/api/v1/cronjob_types.go:42:58}}

It might be preferable, but I had a concern.

Right now, we generate the docs and everything works fine because we’re not tied to specific line numbers β€” we rely on the markers.
That means the docs stay valid even if the code changes.

If we start referencing exact lines, it will likely break whenever the code is updated.
But please feel free to try it out and see how it behaves.
So I think that is why we have not using it :-P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just shared the proposal as a means to refine my own thoughts and let you know what I'm trying to accomplish.

This is a big refactor and I would have to basically rewrite this proposal in the first PR to explain what is going on if I hadn't done it here. hehe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will not work out.
That would mean every time that we do a change and we run make generate
then, we would need to review ALL documentation to ensure that

{{#include testdata/project/api/v1/cronjob_types.go:42:58}}

Still point out for the right place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will not work out. That would mean every time that we do a change and we run make generate then, we would need to review ALL documentation to ensure that

{{#include testdata/project/api/v1/cronjob_types.go:42:58}}

Still point out for the right place.

We have code snippets of 20 files, roughly (search {{#literatego to find out). We would keep golden files of the expected output of the pages that have snippets of these 20 files. Only if something changes in those pages will the tests fail and we check if we have to fix the lines being extracted or we update the golden.

Some yaml files that are completely imported could still be imported completely like they already are. No need to check them in every update. We could still keep golden of them for sanity.

But again, we wouldn't have to test the whole tutorial. Only the critical pages with code snippets from samples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no manual verification. If something changes in the pages, we catch it in the tests.

Copy link
Member

@camilamacedo86 camilamacedo86 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We change the scaffold
Example add a method/more lines
So, how the {{#include testdata/project/api/v1/cronjob_types.go:42:58}} will show exactly the same place as before? we do nto want to show the whole file we want to show a part of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to explain:

We have the tutorial in markdown with {{#include file:lines}}

We keep a copy of the expected output of the pages that have extracted code snippets in them (those are our golden files)

We generate the sample projects
We patch them with our custom code
We test that they work

We build the book to markdown
We diff the pages with code snippets to the markdown we expect

If they are different, we inspect what caused the difference:

  • If the code has drifted in the source file (I.e something else moved, added/removed lines somewhere else in the file) we just adjust the lines we extract.

  • If the lines are correct but something changed intentionally (for example, the go/v4 was updated with a different method) we update the golden file.

Just so we are on the same page:

We don't compare source code, we compare pages. Its markdown built output vs markdown expected.

100 yaml files can change freely, but we don't care about them because they don't affect the book markdown output.

Am I missing something?

I think it will be better if build a POC to test this workflow...

Copy link
Member

@camilamacedo86 camilamacedo86 Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are different, we inspect what caused the difference:

They can always β€” and often will β€” be different.
The effort required to diff all pages and validate them is entirely manual.
This means we must review all tutorials for each PR that changes the scaffolds (almost all PRs), which is both manual and error-prone. It represents a high-effort task that cannot be easily automated or streamlined.

Any solution that requires manual effort and evaluate pages manually for ever PR will not work well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camilamacedo86 I analyzed the files we include in the tutorial using {{#literatego}}. There are 19 of them.

  • Most of them haven't been updated in months, some in years (some since the book was created).
  • The ones that were recently updated, hadn't been updated for months, some in years (some since the book was created).
  • Some others have recent and recurring changes in lines that don't even show up in the tutorial because that specific line is collapsed.

So, if the changes in those files do happen frequently, then the code in the tutorials is not being updated and we already have a problem...

If you want to double check, here are the git blames of the files we include in the tutorial:

  1. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/getting-started/testdata/project/api/v1alpha1/memcached_types.go
  2. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/emptymain.go
  3. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/emptyapi.go
  4. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/project/api/v1/cronjob_types.go
  5. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/project/api/v1/groupversion_info.go
  6. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/emptycontroller.go
  7. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/project/internal/controller/cronjob_controller.go
  8. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/project/cmd/main.go
  9. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/project/internal/webhook/v1/cronjob_webhook.go
  10. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/project/internal/controller/suite_test.go
  11. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/project/internal/controller/cronjob_controller_test.go
  12. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/multiversion-tutorial/testdata/project/api/v2/cronjob_types.go
  13. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/multiversion-tutorial/testdata/project/api/v1/cronjob_types.go
  14. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/multiversion-tutorial/testdata/project/api/v1/cronjob_conversion.go
  15. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/multiversion-tutorial/testdata/project/api/v2/cronjob_conversion.go
  16. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/multiversion-tutorial/testdata/project/internal/webhook/v1/cronjob_webhook.go
  17. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/multiversion-tutorial/testdata/project/internal/webhook/v2/cronjob_webhook.go
  18. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/multiversion-tutorial/testdata/project/cmd/main.go
  19. https://github.com/kubernetes-sigs/kubebuilder/blame/master/docs/book/src/cronjob-tutorial/testdata/finalizer_example.go

These are the files that matter to the tutorials.

Only if something changes in one of them AND only if the lines we include in the tutorial changes in one of them, then the snapshot test will fail and report what is different. The tests result will tell us exactly what is different and in what files, then we will verify what happened and adjust.

There's no manual verification. Tests catch if something is different (expected output vs actual output). If lines drift, we can create a script that fixes the lines by a given offset. If we change code that actually changes the tutorial, then we better verify it to make sure the tutorial is updated. If it's intentional, we would adjust the expected output to match the current state of the scaffold.

But as I have concluded by looking at the code that's in the tutorial, that wouldn't be very often.

Am I missing anything?

@vitorfloriano
Copy link
Contributor Author

@camilamacedo86 As I commented in here and here, this is the first draft of a proposal to rethink how we generate and keep the docs updated.

Sorry for the verbosity. I used AI assistance to refine the writing, but I tried to cut most of the fluff out.

The main difference basically is that we use more of mdBook features to extract code from samples and test the output, and simplify how we insert custom code using patches.

It's still a WIP, but it's good enough to gather your feedback just so we can evaluate if this is worth consideration.

Thanks for your time!

PTAL and LMWYT !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants