Skip to content
This repository was archived by the owner on Oct 14, 2020. It is now read-only.

Commit 71c5212

Browse files
author
Daniel Patanin
committed
Move adr of securecodebox.io to main repo
Since we changed the framework for our website the decision described in the respective adr file will be archived or forgotten about. Furthermore this website is a tool we use for our main project, thus the decision about what "tool" we use for documentation should be saved in the main repository.
1 parent d59daec commit 71c5212

File tree

3 files changed

+415
-338
lines changed

3 files changed

+415
-338
lines changed

docs/adr/adr_0001.adoc

Lines changed: 58 additions & 195 deletions
Original file line numberDiff line numberDiff line change
@@ -1,214 +1,77 @@
1-
[[ADR-0000]]
2-
= ADR-0000: How can we introduce a more general extension concept for data processing modules?
1+
[[ADR-0001]]
2+
= ADR-0001: Choosing the framework for the new secureCodeBox Website
33

44
[cols="h,d",grid=rows,frame=none,stripes=none,caption="Status",%autowidth]
55
|====
6-
6+
// Use one of the ADR status parameter based on status
7+
// Please add a cross reference link to the new ADR on 'superseded' ADR.
8+
// e.g.: {adr_suposed_by} <<ADR-0000>>
79
| Status
810
| ACCEPTED
911

1012
| Date
11-
| 2020-05-20
13+
| 2019-08-21
1214

1315
| Author(s)
14-
| Jannik Hollenbach <Jannik.Hollenbach@iteratec.com>,
15-
Jorge Estigarribia <Jorge.Estigarribia@iteratec.com>,
16-
Robert Seedorff <Robert.Seedorff@iteratec.com>,
17-
Sven Strittmatter <Sven.Strittmatter@iteratec.com>
16+
| Daniel Patanin daniel.patanin@iteratec.com,
17+
Jannick Hollenbach jannick.hollenbach@iteratec.com
18+
// ...
1819
|====
1920

2021
== Context
2122

22-
=== Status Quo
23-
24-
One major challenge implementing the _secureCodeBox_ is to provide a flexible and modular architecture, which enables the open source community to easily understand the concepts and especially to extend the _secureCodeBox_ with individual features. Therefore we decided to separate the process stages of a single security scan (instance of _scanType_ custom resource definition; further abbreviated with _CRD_) in three major phases:
25-
26-
....
27-
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
28-
│ scanning ├─────────▶│ parsing ├─────────▶│ persisting │
29-
│ (phase 1) │ │ (phase 2) │ │ (phase 3) │
30-
└──────────────────┘ └──────────────────┘ └──────────────────┘
31-
....
32-
33-
By now the phase 3 "`persisting`" was implemented by so called _PersistenceProviders_ (e.g., the _persistence-elastic_ provider which is responsible for persisting all findings in a given elasticsearch database). The _secureCodeBox_ Operator is aware of this 3 phases and is responsible for the state model and execution of each security scan.
34-
35-
=== Problem and Question
36-
37-
We identified different additional use cases with a more "`data processing oriented`" pattern than the implemented phase 3 "`persisting`" indicates. For example, we implemented a so called _MetaDataProvider_ feature, which is responsible for enhancing each security finding with additional metadata. But the _MetaDataProvider_ must be executed after the phase 2 "`parsing`" and before the phase 3 "`persisting`" because it depends on the parsed finding results (which will be enhanced) and the updated findings should be also persisted.
38-
39-
To find a proper solution, we split the topic into the following two questions:
40-
41-
. Should we unify the concepts _MetaDataProvider_ and _PersistenceProvider_?
42-
. How should the execution model look like for each concept?
43-
44-
==== Question 1: Should We Unify the Concepts MetaDataProvider and PersistenceProvider?
45-
46-
===== Solution Approach 1: Unify
47-
48-
Both "`modules`" are "`processing`" the security findings, which were generated in the phase 2 "`parsing`",
49-
but there is one major difference between them:
50-
51-
* a _PersistenceProvider_ is processing the findings *read only*, and
52-
* a _MetaDataProvider_ is processing the findings *read and write*.
53-
54-
There is a similar concept in Kubernetes called https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/[AdmissionController], but with the exception that the will be executed before a resource is created.
55-
56-
There are two variants of _AdmissionControllers_:
57-
58-
. _ValidatingWebhookConfiguration_: *read only*, *executed last*; and
59-
. _MutatingWebhookConfiguration_: *read and write*, *executed first*.
60-
61-
We could do a similar thing and introduce CRD which allows to execute "`custom code`" (depends on the second question) after a scan has completed (meaning both phases "`scan`" and "`parsing`" were done). Some name ideas:
62-
63-
* _ScanHooks_
64-
* _ScanCompletionHooks_
65-
* _FindingProcessors_
66-
67-
These could be implemented with a `type` attribute, which declares if they are *read only* or *read and write*.
68-
69-
The _secureCodeBox operator_ would process all these CRDs in the namespace of the scan and execute the *read and write* ones first in serial only one at a time to avoid write conflicts and then the *read only* ones in parallel.
70-
71-
[source,yaml]
72-
----
73-
apiVersion: execution.experimental.securecodebox.io/v1
74-
kind: ScanCompletionHook
75-
metadata:
76-
name: my-metadata
77-
spec:
78-
type: ReadAndWrite
79-
# If implemented like the current persistence provider
80-
image: my-metadata:v2.0.0
81-
----
82-
83-
The Execution Flow would then look something like this:
84-
85-
....
86-
┌ ReadOnly─Hooks─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
87-
┌ ReadAndWriteHooks ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │
88-
┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │
89-
┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │
90-
│ Scan ├──▶│ Parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐
91-
└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │
92-
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘
93-
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
94-
....
95-
96-
====== Pros
97-
98-
* Only one implementation.
99-
* Pretty generic to expand and test out new ideas without having to modify the _secureCodeBox operator_.
100-
101-
====== Cons
102-
103-
* Possibly an "`over-abstraction`".
104-
* Need to refactor the _persistence-elastic_ provider.
105-
* The "`general implementation`" will be harder than the individual ones.
106-
107-
===== Solution Approach 2: Keep Split between Persistence Provider and MetaData Provider
108-
109-
Keep _PersistenceProvider_ as they are and introduce new _MetaDataProvider_ CRD which gets executed before the _PersistenceProviders_ by the __secureCodeBox operator_.
110-
111-
....
112-
┌ Persistence Provider─ ─ ─ ─ ─ ─ ─ ─
113-
┌ MetaData Provider ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │
114-
┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │
115-
┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │
116-
│ Scan ├──▶│ Parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐
117-
└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │
118-
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘
119-
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
120-
....
121-
122-
====== Pros
123-
124-
* Quicker to implement.
125-
* Might be worth it to have a separate concept for it.
126-
127-
====== Cons
128-
129-
* Not sure if it worth to introduce a new CRD for everything, especially when it's conceptually pretty close to to something already existing.
130-
131-
==== Question 2: How Should the Execution Model Look like for Each Concept?
132-
133-
===== Solution Approach 1: Like the Persistence Provider
134-
135-
Basically a docker container which process findings takes two arguments:
136-
137-
. A pre-defined URL to download the findings from.
138-
. A pre-defined URL to upload the modified findings to.
139-
140-
Examples:
141-
142-
* NodeJS: `node my-metadata.js "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
143-
* Java: `java my-metadata.jar "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
144-
* Golang: `./my-metadata "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
145-
146-
====== Pros
147-
148-
* One liner with the current implementations.
149-
* Code overhead / wrapper code is pretty minimal.
150-
* Zero scale: no resource costs when nothing is running.
151-
152-
===== Cons
153-
154-
* May results in too many Kubernetes jobs.
155-
** Resource blocking on finished resources.
156-
** `ttlAfterFinished` enabled.
157-
* Container runtime overhead (especially time).
158-
159-
===== Solution Approach 2: A WebHooks Like Concept
160-
161-
Analog to kubernetes webhooks: HTTP server receiving findings and returning results.
162-
163-
===== Pros
164-
165-
* Milliseconds instead of seconds for processing.
166-
* No overhead for container Creation.
167-
* No additional kubernetes jobs needed.
168-
169-
===== Cons
170-
171-
* Introduces new running services which needs to be maintained and have uptime.
172-
* Code overhead / boilerplate (Can be mitigated by an SDK).
173-
* Debugging of individual _MetaDataProvider_ is harder than a single service which handles everything.
174-
* Introduces "`new`"cConcept.
175-
* Certificate management for webhook services (`cert-manager` required by default?).
176-
* Scaling for systems with lots of load could be a problem.
177-
* One service per namespace (multiple tenants) needed -> results in many running active services which is resource consuming.
23+
There are tons of different frameworks for building websites out there. We must choose the most fitting one for our use, fulfilling our mandatory requirements:
24+
25+
• Common programming language, if applicable easy to learn
26+
• Overall easy to use and start-up, also locally
27+
• Tutorials, examples and a good documentation
28+
• Bonus points for great and many easy to use templates and plugins
29+
• Needs continuous support and contribution
30+
• Must be able to be deployed as GitHub pages
31+
32+
We will choose from the following popular/trending:
33+
34+
https://gridsome.org/[Gridsome] +
35+
https://www.gatsbyjs.org/[Gatsby] +
36+
https://gohugo.io/[Hugo] +
37+
https://jekyllrb.com/[Jekyll]
38+
39+
=== Research
40+
41+
These frameworks do all fulfill the requirements to the extent that I estimate them as wellsuited. First, I researched the listed features on the respective sites or quickly googled after it
42+
specifically and found instantly the requested feature. I followed up with a general overview
43+
of how old the frameworks, how popular they are and for example pages build with them.
44+
Afterwards I searched for comparison blogs and posts, mostly to examine their comments.
45+
Most of these „pro-cons “-posts are inaccurate and very superficial, but luckily because of that
46+
the comment sections hold interesting discussions and comparisons from overall features and
47+
usability to specific issues and problems of each framework and which framework fits what
48+
use-cases in general. After this research I’ve come to a majority of similar experience sharing
49+
and discussions. These described the distribution of these frameworks as follows (roughly
50+
summarized):
51+
52+
Gridsome is like Gatsby just for VueJS.
53+
Gatsby is blazing fast after building the pages but requires a little bit more understanding of
54+
JavaScript and React and may not be as easy to get behind if you’ve never built a site with a
55+
static site generator before.
56+
Hugo is fast in building and based on Golang. But as a newbie to that language you’ll find yourself using the documentation very much, unless you learn this language to a curtain depth.
57+
Jekyll is simple in templating and very good for quickly starting a small blog site but based on
58+
ruby and therefore requires ruby dependencies.
17859

17960
== Decision
18061

181-
Regarding question 1 it seems that both solution approaches are resulting in the same execution model. We decided to implement solution approach 1 and unify both concepts into a more general concept with the name _hook concept_. Therefore we exchange the existing name _PersistenceProvider_ for phase 3 in the execution model with a more general term _processing_:
182-
183-
....
184-
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
185-
│ scanning ├─────────▶│ parsing ├─────────▶│ processing │
186-
│ (Phase 1) │ │ (Phase 2) │ │ (Phase 3) │
187-
└──────────────────┘ └──────────────────┘ └──────────────────┘
188-
....
189-
190-
Regarding question 2 we decided to implement the solution approach 1 with a job-based approach (no active service component needed). Therefore the phase 3 _processing_ will be split into two separate phases named _ReadAndWriteHooks_ (3.1) and _ReadOnlyHooks_ (3.2)
191-
// #30 to what refers 3.1 and 3.2?
192-
193-
....
194-
┌ 3.2 processing: ReadOnlyHooks ─ ─ ─
195-
┌ 3.1 processing: ReadAndWriteHooks ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │
196-
┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │
197-
┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │
198-
│ scanning ├──▶│ parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐
199-
└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │
200-
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘
201-
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
202-
....
62+
So, it seems that Hugo is a pretty good choice for sites with many, many…. like many pages.
63+
Jekyll seems to fit for a quick build. Gatsby and Gridsome require a bit more time to learn but
64+
have their advantages in speed and growth of the site. And whether you choose Gridsome over
65+
Gatsby relies on whether you want to use VueJS or not.
20366

204-
== Consequences
205-
206-
With the new _hook concept_ we open the _phase 3 processing_ to a more intuitive and flexible architecture. It is easier to understand because _WebHooks_ are already a well known concept. It is possible to keep the existing implementation of the _PersistenceProvider_ and integrate them with a lot of other possible processing components in a more general fashion. In the end, this step will result in a lot of additional feature possibilities, which go far beyond the existing ones proposed here. Therefore we only need to implement this concept once in the _secureCodeBox operator_ and new ideas for extending the _DataProcessing_ will not enforce conceptual or architectural changes.
67+
Finally we’ve decided to use Gatsby. Some of the main reasons is it’s fast performance, the extensive documentation and tutorials and also the language, since Hugo (the
68+
other framework we considered mainly) is based on Golang, and as for my part as a developer I
69+
feel completely comfortable and prefer working with JSX. Overall it comes down to preferences mostly, since we’re not going to build a giant Website, nor are we planning on implementing “crazy” Features.
20770

208-
Ideas for additional processing hooks:
71+
== Consequences
20972

210-
* Notifier hooks (_ReadOnlyHook_) e.g., for chat (slack, teams etc.), metric, alerting systems
211-
* MetaData enrichment hooks (_ReadAndWriteHook_)
212-
* FilterData hooks (_ReadAndWriteHook_) (e.g., false/positive handling)
213-
* SystemIntegration hooks (_ReadOnlyHook_) e.g., for ticketing systems like Jira
214-
* CascadingScans hooks (_ReadOnlyHook_) e.g., for starting new security scans based on findings
73+
For the integration of our multi-repository documentation we’ll use
74+
Antora if working this out with Gatsby is going to be more difficult than integrating Antora.
75+
We’re aware that using Gatsby requires a bit more maintenance and has the drawback, that if
76+
anybody else will maintain or work on the website, this person will need to at least understand
77+
the basics of React and GraphQL.

0 commit comments

Comments
 (0)