secureCodeBox
diff --git a/‎docs/adr/adr_0001.adoc‎
Lines changed: 58 additions & 195 deletions b/‎docs/adr/adr_0001.adoc‎
Lines changed: 58 additions & 195 deletions
@@ -1,214 +1,77 @@
-[[ADR-0000]]
-= ADR-0000: How can we introduce a more general extension concept for data processing modules?
+[[ADR-0001]]
+= ADR-0001: Choosing the framework for the new secureCodeBox Website
 
 [cols="h,d",grid=rows,frame=none,stripes=none,caption="Status",%autowidth]
 |====
-
+// Use one of the ADR status parameter based on status
+// Please add a cross reference link to the new ADR on 'superseded' ADR.
+// e.g.: {adr_suposed_by} <<ADR-0000>>
 | Status
 | ACCEPTED
 
 | Date
-| 2020-05-20
+| 2019-08-21
 
 | Author(s)
-| Jannik Hollenbach <Jannik.Hollenbach@iteratec.com>,
-  Jorge Estigarribia <Jorge.Estigarribia@iteratec.com>,
-  Robert Seedorff <Robert.Seedorff@iteratec.com>,
-  Sven Strittmatter <Sven.Strittmatter@iteratec.com>
+| Daniel Patanin daniel.patanin@iteratec.com,
+  Jannick Hollenbach jannick.hollenbach@iteratec.com
+// ...
 |====
 
 == Context
 
-=== Status Quo
-
-One major challenge implementing the _secureCodeBox_ is to provide a flexible and modular architecture, which enables the open source community to easily understand the concepts and especially to extend the _secureCodeBox_ with individual features. Therefore we decided to separate the process stages of a single security scan (instance of _scanType_ custom resource definition; further abbreviated with _CRD_) in three major phases:
-
-....
-┌──────────────────┐          ┌──────────────────┐          ┌──────────────────┐
-│     scanning     ├─────────▶│     parsing      ├─────────▶│    persisting    │
-│    (phase 1)     │          │    (phase 2)     │          │    (phase 3)     │
-└──────────────────┘          └──────────────────┘          └──────────────────┘
-....
-
-By now the phase 3 "`persisting`" was implemented by so called _PersistenceProviders_ (e.g., the _persistence-elastic_ provider which is responsible for persisting all findings in a given elasticsearch database). The _secureCodeBox_ Operator is aware of this 3 phases and is responsible for the state model and execution of each security scan.
-
-=== Problem and Question
-
-We identified different additional use cases with a more "`data processing oriented`" pattern than the implemented phase 3 "`persisting`" indicates. For example, we implemented a so called _MetaDataProvider_ feature, which is responsible for enhancing each security finding with additional metadata. But the _MetaDataProvider_ must be executed after the phase 2 "`parsing`" and before the phase 3 "`persisting`" because it depends on the parsed finding results (which will be enhanced) and the updated findings should be also persisted.
-
-To find a proper solution, we split the topic into the following two questions:
-
-. Should we unify the concepts _MetaDataProvider_ and _PersistenceProvider_?
-. How should the execution model look like for each concept?
-
-==== Question 1: Should We Unify the Concepts MetaDataProvider and PersistenceProvider?
-
-===== Solution Approach 1: Unify
-
-Both "`modules`" are "`processing`" the security findings, which were generated in the phase 2 "`parsing`",
-but there is one major difference between them:
-
-* a _PersistenceProvider_ is processing the findings *read only*, and
-* a _MetaDataProvider_ is processing the findings *read and write*.
-
-There is a similar concept in Kubernetes called https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/[AdmissionController], but with the exception that the will be executed before a resource is created.
-
-There are two variants of _AdmissionControllers_:
-
-. _ValidatingWebhookConfiguration_: *read only*, *executed last*; and
-. _MutatingWebhookConfiguration_: *read and write*, *executed first*.
-
-We could do a similar thing and introduce CRD which allows to execute "`custom code`" (depends on the second question) after a scan has completed (meaning both phases "`scan`" and "`parsing`" were done). Some name ideas:
-
-* _ScanHooks_
-* _ScanCompletionHooks_
-* _FindingProcessors_
-
-These could be implemented with a `type` attribute, which declares if they are *read only* or *read and write*.
-
-The _secureCodeBox operator_ would process all these CRDs in the namespace of the scan and execute the *read and write* ones first in serial only one at a time to avoid write conflicts and then the *read only* ones in parallel.
-
-[source,yaml]
-----
-apiVersion: execution.experimental.securecodebox.io/v1
-kind: ScanCompletionHook
-metadata:
-  name: my-metadata
-spec:
-  type: ReadAndWrite
-  # If implemented like the current persistence provider
-  image: my-metadata:v2.0.0
-----
-
-The Execution Flow would then look something like this:
-
-....
-                                                                                                           ┌ ReadOnly─Hooks─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
-                                              ┌ ReadAndWriteHooks ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        ┌────────────────────────────────┐ │
-                                                ┌───────────────────────┐                            │  ┌──┼▶│  Elastic PersistenceProvider   │
-┌──────────────────┐   ┌──────────────────┐   │ │ ReadAndWrite Hook #1  │  ┌───────────────────────┐    │    └────────────────────────────────┘ │
-│       Scan       ├──▶│     Parsing      │────▶│  "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2  │─┼──┤  │ ┌────────────────────────────────┐
-└──────────────────┘   └──────────────────┘   │ └───────────────────────┘  └───────────────────────┘    └───▶│ DefectDojo PersistenceProvider │ │
-                                               ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘     │ └────────────────────────────────┘
-                                                                                                            ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
-....
-
-====== Pros
-
-* Only one implementation.
-* Pretty generic to expand and test out new ideas without having to modify the _secureCodeBox operator_.
-
-====== Cons
-
-* Possibly an "`over-abstraction`".
-* Need to refactor the _persistence-elastic_ provider.
-* The "`general implementation`" will be harder than the individual ones.
-
-===== Solution Approach 2: Keep Split between Persistence Provider and MetaData Provider
-
-Keep _PersistenceProvider_ as they are and introduce new _MetaDataProvider_ CRD which gets executed before the _PersistenceProviders_ by the __secureCodeBox operator_.
-
-....
-                                                                                                           ┌ Persistence Provider─ ─ ─ ─ ─ ─ ─ ─
-                                              ┌ MetaData Provider ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        ┌────────────────────────────────┐ │
-                                                ┌───────────────────────┐                            │  ┌──┼▶│  Elastic PersistenceProvider   │
-┌──────────────────┐   ┌──────────────────┐   │ │ ReadAndWrite Hook #1  │  ┌───────────────────────┐    │    └────────────────────────────────┘ │
-│       Scan       ├──▶│     Parsing      │────▶│ "MyMetaDataProvider"  ├─▶│ ReadAndWrite Hook #2  │─┼──┤  │ ┌────────────────────────────────┐
-└──────────────────┘   └──────────────────┘   │ └───────────────────────┘  └───────────────────────┘    └───▶│ DefectDojo PersistenceProvider │ │
-                                               ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘     │ └────────────────────────────────┘
-                                                                                                            ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
-....
-
-====== Pros
-
-* Quicker to implement.
-* Might be worth it to have a separate concept for it.
-
-====== Cons
-
-* Not sure if it worth to introduce a new CRD for everything, especially when it's conceptually pretty close to to something already existing.
-
-==== Question 2: How Should the Execution Model Look like for Each Concept?
-
-===== Solution Approach 1: Like the Persistence Provider
-
-Basically a docker container which process findings takes two arguments:
-
-. A pre-defined URL to download the findings from.
-. A pre-defined URL to upload the modified findings to.
-
-Examples:
-
-* NodeJS: `node my-metadata.js "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
-* Java: `java my-metadata.jar "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
-* Golang: `./my-metadata "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
-
-====== Pros
-
-* One liner with the current implementations.
-* Code overhead / wrapper code is pretty minimal.
-* Zero scale: no resource costs when nothing is running.
-
-===== Cons
-
-* May results in too many Kubernetes jobs.
-** Resource blocking on finished resources.
-** `ttlAfterFinished` enabled.
-* Container runtime overhead (especially time).
-
-===== Solution Approach 2: A WebHooks Like Concept
-
-Analog to kubernetes webhooks: HTTP server receiving findings and returning results.
-
-===== Pros
-
-* Milliseconds instead of seconds for processing.
-* No overhead for container Creation.
-* No additional kubernetes jobs needed.
-
-===== Cons
-
-* Introduces new running services which needs to be maintained and have uptime.
-* Code overhead / boilerplate (Can be mitigated by an SDK).
-* Debugging of individual _MetaDataProvider_ is harder than a single service which handles everything.
-* Introduces "`new`"cConcept.
-* Certificate management for webhook services (`cert-manager` required by default?).
-* Scaling for systems with lots of load could be a problem.
-* One service per namespace (multiple tenants) needed -> results in many running active services which is resource consuming.
+There are tons of different frameworks for building websites out there. We must choose the most fitting one for our use, fulfilling our mandatory requirements: 
+
+• Common programming language, if applicable easy to learn
+• Overall easy to use and start-up, also locally
+• Tutorials, examples and a good documentation
+• Bonus points for great and many easy to use templates and plugins
+• Needs continuous support and contribution
+• Must be able to be deployed as GitHub pages
+
+We will choose from the following popular/trending:
+
+https://gridsome.org/[Gridsome] +
+https://www.gatsbyjs.org/[Gatsby] +
+https://gohugo.io/[Hugo] +
+https://jekyllrb.com/[Jekyll] 
+
+=== Research
+
+These frameworks do all fulfill the requirements to the extent that I estimate them as wellsuited. First, I researched the listed features on the respective sites or quickly googled after it
+specifically and found instantly the requested feature. I followed up with a general overview
+of how old the frameworks, how popular they are and for example pages build with them.
+Afterwards I searched for comparison blogs and posts, mostly to examine their comments.
+Most of these „pro-cons “-posts are inaccurate and very superficial, but luckily because of that
+the comment sections hold interesting discussions and comparisons from overall features and
+usability to specific issues and problems of each framework and which framework fits what
+use-cases in general. After this research I’ve come to a majority of similar experience sharing
+and discussions. These described the distribution of these frameworks as follows (roughly
+summarized):
+
+Gridsome is like Gatsby just for VueJS.
+Gatsby is blazing fast after building the pages but requires a little bit more understanding of
+JavaScript and React and may not be as easy to get behind if you’ve never built a site with a
+static site generator before.
+Hugo is fast in building and based on Golang. But as a newbie to that language you’ll find yourself using the documentation very much, unless you learn this language to a curtain depth.
+Jekyll is simple in templating and very good for quickly starting a small blog site but based on
+ruby and therefore requires ruby dependencies.
 
 == Decision
 
-Regarding question 1 it seems that both solution approaches are resulting in the same execution model. We decided to implement solution approach 1 and unify both concepts into a more general concept with the name _hook concept_. Therefore we exchange the existing name _PersistenceProvider_ for phase 3 in the execution model with a more general term _processing_:
-
-....
-┌──────────────────┐          ┌──────────────────┐          ┌──────────────────┐
-│    scanning      ├─────────▶│    parsing       ├─────────▶│    processing    │
-│    (Phase 1)     │          │    (Phase 2)     │          │    (Phase 3)     │
-└──────────────────┘          └──────────────────┘          └──────────────────┘
-....
-
-Regarding question 2 we decided to implement the solution approach 1 with a job-based approach (no active service component needed). Therefore the phase 3 _processing_ will be split into two separate phases named _ReadAndWriteHooks_ (3.1) and _ReadOnlyHooks_ (3.2)
-// #30 to what refers 3.1 and 3.2?
-
-....
-                                                                                                           ┌ 3.2 processing: ReadOnlyHooks ─ ─ ─
-                                              ┌ 3.1 processing: ReadAndWriteHooks ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        ┌────────────────────────────────┐ │
-                                                ┌───────────────────────┐                            │  ┌──┼▶│  Elastic PersistenceProvider   │
-┌──────────────────┐   ┌──────────────────┐   │ │ ReadAndWrite Hook #1  │  ┌───────────────────────┐    │    └────────────────────────────────┘ │
-│    scanning      ├──▶│     parsing      │────▶│  "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2  │─┼──┤  │ ┌────────────────────────────────┐
-└──────────────────┘   └──────────────────┘   │ └───────────────────────┘  └───────────────────────┘    └───▶│ DefectDojo PersistenceProvider │ │
-                                               ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘     │ └────────────────────────────────┘
-                                                                                                            ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
-....
+So, it seems that Hugo is a pretty good choice for sites with many, many…. like many pages.
+Jekyll seems to fit for a quick build. Gatsby and Gridsome require a bit more time to learn but
+have their advantages in speed and growth of the site. And whether you choose Gridsome over
+Gatsby relies on whether you want to use VueJS or not.
 
-== Consequences
-
-With the new _hook concept_ we open the _phase 3 processing_ to a more intuitive and flexible architecture. It is easier to understand because _WebHooks_ are already a well known concept. It is possible to keep the existing implementation of the _PersistenceProvider_ and integrate them with a lot of other possible processing components in a more general fashion. In the end, this step will result in a lot of additional feature possibilities, which go far beyond the existing ones proposed here. Therefore we only need to implement this concept once in the _secureCodeBox operator_ and new ideas for extending the _DataProcessing_ will not enforce conceptual or architectural changes.
+Finally we’ve decided to use Gatsby. Some of the main reasons is it’s fast performance, the extensive documentation and tutorials and also the language, since Hugo (the
+other framework we considered mainly) is based on Golang, and as for my part as a developer I
+feel completely comfortable and prefer working with JSX. Overall it comes down to preferences mostly, since we’re not going to build a giant Website, nor are we planning on implementing “crazy” Features. 
 
-Ideas for additional processing hooks:
+== Consequences
 
-* Notifier hooks (_ReadOnlyHook_) e.g., for chat (slack, teams etc.), metric, alerting systems
-* MetaData enrichment hooks (_ReadAndWriteHook_)
-* FilterData hooks (_ReadAndWriteHook_) (e.g., false/positive handling)
-* SystemIntegration hooks (_ReadOnlyHook_) e.g., for ticketing systems like Jira
-* CascadingScans hooks (_ReadOnlyHook_) e.g., for starting new security scans based on findings
+For the integration of our multi-repository documentation we’ll use
+Antora if working this out with Gatsby is going to be more difficult than integrating Antora.
+We’re aware that using Gatsby requires a bit more maintenance and has the drawback, that if
+anybody else will maintain or work on the website, this person will need to at least understand
+the basics of React and GraphQL.