|
1 | | -[[ADR-0000]] |
2 | | -= ADR-0000: How can we introduce a more general extension concept for data processing modules? |
| 1 | +[[ADR-0001]] |
| 2 | += ADR-0001: Choosing the framework for the new secureCodeBox Website |
3 | 3 |
|
4 | 4 | [cols="h,d",grid=rows,frame=none,stripes=none,caption="Status",%autowidth] |
5 | 5 | |==== |
6 | | - |
| 6 | +// Use one of the ADR status parameter based on status |
| 7 | +// Please add a cross reference link to the new ADR on 'superseded' ADR. |
| 8 | +// e.g.: {adr_suposed_by} <<ADR-0000>> |
7 | 9 | | Status |
8 | 10 | | ACCEPTED |
9 | 11 |
|
10 | 12 | | Date |
11 | | -| 2020-05-20 |
| 13 | +| 2019-08-21 |
12 | 14 |
|
13 | 15 | | Author(s) |
14 | | -| Jannik Hollenbach <Jannik.Hollenbach@iteratec.com>, |
15 | | - Jorge Estigarribia <Jorge.Estigarribia@iteratec.com>, |
16 | | - Robert Seedorff <Robert.Seedorff@iteratec.com>, |
17 | | - Sven Strittmatter <Sven.Strittmatter@iteratec.com> |
| 16 | +| Daniel Patanin daniel.patanin@iteratec.com, |
| 17 | + Jannick Hollenbach jannick.hollenbach@iteratec.com |
| 18 | +// ... |
18 | 19 | |==== |
19 | 20 |
|
20 | 21 | == Context |
21 | 22 |
|
22 | | -=== Status Quo |
23 | | - |
24 | | -One major challenge implementing the _secureCodeBox_ is to provide a flexible and modular architecture, which enables the open source community to easily understand the concepts and especially to extend the _secureCodeBox_ with individual features. Therefore we decided to separate the process stages of a single security scan (instance of _scanType_ custom resource definition; further abbreviated with _CRD_) in three major phases: |
25 | | - |
26 | | -.... |
27 | | -┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ |
28 | | -│ scanning ├─────────▶│ parsing ├─────────▶│ persisting │ |
29 | | -│ (phase 1) │ │ (phase 2) │ │ (phase 3) │ |
30 | | -└──────────────────┘ └──────────────────┘ └──────────────────┘ |
31 | | -.... |
32 | | - |
33 | | -By now the phase 3 "`persisting`" was implemented by so called _PersistenceProviders_ (e.g., the _persistence-elastic_ provider which is responsible for persisting all findings in a given elasticsearch database). The _secureCodeBox_ Operator is aware of this 3 phases and is responsible for the state model and execution of each security scan. |
34 | | - |
35 | | -=== Problem and Question |
36 | | - |
37 | | -We identified different additional use cases with a more "`data processing oriented`" pattern than the implemented phase 3 "`persisting`" indicates. For example, we implemented a so called _MetaDataProvider_ feature, which is responsible for enhancing each security finding with additional metadata. But the _MetaDataProvider_ must be executed after the phase 2 "`parsing`" and before the phase 3 "`persisting`" because it depends on the parsed finding results (which will be enhanced) and the updated findings should be also persisted. |
38 | | - |
39 | | -To find a proper solution, we split the topic into the following two questions: |
40 | | - |
41 | | -. Should we unify the concepts _MetaDataProvider_ and _PersistenceProvider_? |
42 | | -. How should the execution model look like for each concept? |
43 | | - |
44 | | -==== Question 1: Should We Unify the Concepts MetaDataProvider and PersistenceProvider? |
45 | | - |
46 | | -===== Solution Approach 1: Unify |
47 | | - |
48 | | -Both "`modules`" are "`processing`" the security findings, which were generated in the phase 2 "`parsing`", |
49 | | -but there is one major difference between them: |
50 | | - |
51 | | -* a _PersistenceProvider_ is processing the findings *read only*, and |
52 | | -* a _MetaDataProvider_ is processing the findings *read and write*. |
53 | | - |
54 | | -There is a similar concept in Kubernetes called https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/[AdmissionController], but with the exception that the will be executed before a resource is created. |
55 | | - |
56 | | -There are two variants of _AdmissionControllers_: |
57 | | - |
58 | | -. _ValidatingWebhookConfiguration_: *read only*, *executed last*; and |
59 | | -. _MutatingWebhookConfiguration_: *read and write*, *executed first*. |
60 | | - |
61 | | -We could do a similar thing and introduce CRD which allows to execute "`custom code`" (depends on the second question) after a scan has completed (meaning both phases "`scan`" and "`parsing`" were done). Some name ideas: |
62 | | - |
63 | | -* _ScanHooks_ |
64 | | -* _ScanCompletionHooks_ |
65 | | -* _FindingProcessors_ |
66 | | - |
67 | | -These could be implemented with a `type` attribute, which declares if they are *read only* or *read and write*. |
68 | | - |
69 | | -The _secureCodeBox operator_ would process all these CRDs in the namespace of the scan and execute the *read and write* ones first in serial only one at a time to avoid write conflicts and then the *read only* ones in parallel. |
70 | | - |
71 | | -[source,yaml] |
72 | | ----- |
73 | | -apiVersion: execution.experimental.securecodebox.io/v1 |
74 | | -kind: ScanCompletionHook |
75 | | -metadata: |
76 | | - name: my-metadata |
77 | | -spec: |
78 | | - type: ReadAndWrite |
79 | | - # If implemented like the current persistence provider |
80 | | - image: my-metadata:v2.0.0 |
81 | | ----- |
82 | | - |
83 | | -The Execution Flow would then look something like this: |
84 | | - |
85 | | -.... |
86 | | - ┌ ReadOnly─Hooks─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ |
87 | | - ┌ ReadAndWriteHooks ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │ |
88 | | - ┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │ |
89 | | -┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │ |
90 | | -│ Scan ├──▶│ Parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐ |
91 | | -└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │ |
92 | | - ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘ |
93 | | - ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ |
94 | | -.... |
95 | | - |
96 | | -====== Pros |
97 | | - |
98 | | -* Only one implementation. |
99 | | -* Pretty generic to expand and test out new ideas without having to modify the _secureCodeBox operator_. |
100 | | - |
101 | | -====== Cons |
102 | | - |
103 | | -* Possibly an "`over-abstraction`". |
104 | | -* Need to refactor the _persistence-elastic_ provider. |
105 | | -* The "`general implementation`" will be harder than the individual ones. |
106 | | - |
107 | | -===== Solution Approach 2: Keep Split between Persistence Provider and MetaData Provider |
108 | | - |
109 | | -Keep _PersistenceProvider_ as they are and introduce new _MetaDataProvider_ CRD which gets executed before the _PersistenceProviders_ by the __secureCodeBox operator_. |
110 | | - |
111 | | -.... |
112 | | - ┌ Persistence Provider─ ─ ─ ─ ─ ─ ─ ─ |
113 | | - ┌ MetaData Provider ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │ |
114 | | - ┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │ |
115 | | -┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │ |
116 | | -│ Scan ├──▶│ Parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐ |
117 | | -└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │ |
118 | | - ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘ |
119 | | - ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ |
120 | | -.... |
121 | | - |
122 | | -====== Pros |
123 | | - |
124 | | -* Quicker to implement. |
125 | | -* Might be worth it to have a separate concept for it. |
126 | | - |
127 | | -====== Cons |
128 | | - |
129 | | -* Not sure if it worth to introduce a new CRD for everything, especially when it's conceptually pretty close to to something already existing. |
130 | | - |
131 | | -==== Question 2: How Should the Execution Model Look like for Each Concept? |
132 | | - |
133 | | -===== Solution Approach 1: Like the Persistence Provider |
134 | | - |
135 | | -Basically a docker container which process findings takes two arguments: |
136 | | - |
137 | | -. A pre-defined URL to download the findings from. |
138 | | -. A pre-defined URL to upload the modified findings to. |
139 | | - |
140 | | -Examples: |
141 | | - |
142 | | -* NodeJS: `node my-metadata.js "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."` |
143 | | -* Java: `java my-metadata.jar "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."` |
144 | | -* Golang: `./my-metadata "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."` |
145 | | - |
146 | | -====== Pros |
147 | | - |
148 | | -* One liner with the current implementations. |
149 | | -* Code overhead / wrapper code is pretty minimal. |
150 | | -* Zero scale: no resource costs when nothing is running. |
151 | | - |
152 | | -===== Cons |
153 | | - |
154 | | -* May results in too many Kubernetes jobs. |
155 | | -** Resource blocking on finished resources. |
156 | | -** `ttlAfterFinished` enabled. |
157 | | -* Container runtime overhead (especially time). |
158 | | - |
159 | | -===== Solution Approach 2: A WebHooks Like Concept |
160 | | - |
161 | | -Analog to kubernetes webhooks: HTTP server receiving findings and returning results. |
162 | | - |
163 | | -===== Pros |
164 | | - |
165 | | -* Milliseconds instead of seconds for processing. |
166 | | -* No overhead for container Creation. |
167 | | -* No additional kubernetes jobs needed. |
168 | | - |
169 | | -===== Cons |
170 | | - |
171 | | -* Introduces new running services which needs to be maintained and have uptime. |
172 | | -* Code overhead / boilerplate (Can be mitigated by an SDK). |
173 | | -* Debugging of individual _MetaDataProvider_ is harder than a single service which handles everything. |
174 | | -* Introduces "`new`"cConcept. |
175 | | -* Certificate management for webhook services (`cert-manager` required by default?). |
176 | | -* Scaling for systems with lots of load could be a problem. |
177 | | -* One service per namespace (multiple tenants) needed -> results in many running active services which is resource consuming. |
| 23 | +There are tons of different frameworks for building websites out there. We must choose the most fitting one for our use, fulfilling our mandatory requirements: |
| 24 | + |
| 25 | +• Common programming language, if applicable easy to learn |
| 26 | +• Overall easy to use and start-up, also locally |
| 27 | +• Tutorials, examples and a good documentation |
| 28 | +• Bonus points for great and many easy to use templates and plugins |
| 29 | +• Needs continuous support and contribution |
| 30 | +• Must be able to be deployed as GitHub pages |
| 31 | + |
| 32 | +We will choose from the following popular/trending: |
| 33 | + |
| 34 | +https://gridsome.org/[Gridsome] + |
| 35 | +https://www.gatsbyjs.org/[Gatsby] + |
| 36 | +https://gohugo.io/[Hugo] + |
| 37 | +https://jekyllrb.com/[Jekyll] |
| 38 | + |
| 39 | +=== Research |
| 40 | + |
| 41 | +These frameworks do all fulfill the requirements to the extent that I estimate them as wellsuited. First, I researched the listed features on the respective sites or quickly googled after it |
| 42 | +specifically and found instantly the requested feature. I followed up with a general overview |
| 43 | +of how old the frameworks, how popular they are and for example pages build with them. |
| 44 | +Afterwards I searched for comparison blogs and posts, mostly to examine their comments. |
| 45 | +Most of these „pro-cons “-posts are inaccurate and very superficial, but luckily because of that |
| 46 | +the comment sections hold interesting discussions and comparisons from overall features and |
| 47 | +usability to specific issues and problems of each framework and which framework fits what |
| 48 | +use-cases in general. After this research I’ve come to a majority of similar experience sharing |
| 49 | +and discussions. These described the distribution of these frameworks as follows (roughly |
| 50 | +summarized): |
| 51 | + |
| 52 | +Gridsome is like Gatsby just for VueJS. |
| 53 | +Gatsby is blazing fast after building the pages but requires a little bit more understanding of |
| 54 | +JavaScript and React and may not be as easy to get behind if you’ve never built a site with a |
| 55 | +static site generator before. |
| 56 | +Hugo is fast in building and based on Golang. But as a newbie to that language you’ll find yourself using the documentation very much, unless you learn this language to a curtain depth. |
| 57 | +Jekyll is simple in templating and very good for quickly starting a small blog site but based on |
| 58 | +ruby and therefore requires ruby dependencies. |
178 | 59 |
|
179 | 60 | == Decision |
180 | 61 |
|
181 | | -Regarding question 1 it seems that both solution approaches are resulting in the same execution model. We decided to implement solution approach 1 and unify both concepts into a more general concept with the name _hook concept_. Therefore we exchange the existing name _PersistenceProvider_ for phase 3 in the execution model with a more general term _processing_: |
182 | | - |
183 | | -.... |
184 | | -┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ |
185 | | -│ scanning ├─────────▶│ parsing ├─────────▶│ processing │ |
186 | | -│ (Phase 1) │ │ (Phase 2) │ │ (Phase 3) │ |
187 | | -└──────────────────┘ └──────────────────┘ └──────────────────┘ |
188 | | -.... |
189 | | - |
190 | | -Regarding question 2 we decided to implement the solution approach 1 with a job-based approach (no active service component needed). Therefore the phase 3 _processing_ will be split into two separate phases named _ReadAndWriteHooks_ (3.1) and _ReadOnlyHooks_ (3.2) |
191 | | -// #30 to what refers 3.1 and 3.2? |
192 | | - |
193 | | -.... |
194 | | - ┌ 3.2 processing: ReadOnlyHooks ─ ─ ─ |
195 | | - ┌ 3.1 processing: ReadAndWriteHooks ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │ |
196 | | - ┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │ |
197 | | -┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │ |
198 | | -│ scanning ├──▶│ parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐ |
199 | | -└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │ |
200 | | - ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘ |
201 | | - ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ |
202 | | -.... |
| 62 | +So, it seems that Hugo is a pretty good choice for sites with many, many…. like many pages. |
| 63 | +Jekyll seems to fit for a quick build. Gatsby and Gridsome require a bit more time to learn but |
| 64 | +have their advantages in speed and growth of the site. And whether you choose Gridsome over |
| 65 | +Gatsby relies on whether you want to use VueJS or not. |
203 | 66 |
|
204 | | -== Consequences |
205 | | - |
206 | | -With the new _hook concept_ we open the _phase 3 processing_ to a more intuitive and flexible architecture. It is easier to understand because _WebHooks_ are already a well known concept. It is possible to keep the existing implementation of the _PersistenceProvider_ and integrate them with a lot of other possible processing components in a more general fashion. In the end, this step will result in a lot of additional feature possibilities, which go far beyond the existing ones proposed here. Therefore we only need to implement this concept once in the _secureCodeBox operator_ and new ideas for extending the _DataProcessing_ will not enforce conceptual or architectural changes. |
| 67 | +Finally we’ve decided to use Gatsby. Some of the main reasons is it’s fast performance, the extensive documentation and tutorials and also the language, since Hugo (the |
| 68 | +other framework we considered mainly) is based on Golang, and as for my part as a developer I |
| 69 | +feel completely comfortable and prefer working with JSX. Overall it comes down to preferences mostly, since we’re not going to build a giant Website, nor are we planning on implementing “crazy” Features. |
207 | 70 |
|
208 | | -Ideas for additional processing hooks: |
| 71 | +== Consequences |
209 | 72 |
|
210 | | -* Notifier hooks (_ReadOnlyHook_) e.g., for chat (slack, teams etc.), metric, alerting systems |
211 | | -* MetaData enrichment hooks (_ReadAndWriteHook_) |
212 | | -* FilterData hooks (_ReadAndWriteHook_) (e.g., false/positive handling) |
213 | | -* SystemIntegration hooks (_ReadOnlyHook_) e.g., for ticketing systems like Jira |
214 | | -* CascadingScans hooks (_ReadOnlyHook_) e.g., for starting new security scans based on findings |
| 73 | +For the integration of our multi-repository documentation we’ll use |
| 74 | +Antora if working this out with Gatsby is going to be more difficult than integrating Antora. |
| 75 | +We’re aware that using Gatsby requires a bit more maintenance and has the drawback, that if |
| 76 | +anybody else will maintain or work on the website, this person will need to at least understand |
| 77 | +the basics of React and GraphQL. |
0 commit comments