COM6012 · kierbn · Mar 1, 2025 · Mar 1, 2025
diff --git a/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md b/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md
@@ -530,12 +530,13 @@ Starting from this lab, you need to use *as many DataFrame functions as possible
 
     Load the Aug95 NASA access log data in Lab 1 and create a DataFrame with FIVE columns by **specifying** the schema according to the description in the downloaded html file. Use this DataFrame for the following questions.
 
-2. Find out the number of **unique** hosts in total (i.e. in August 1995)?
-3. Find out the most frequent visitor, i.e. the host with the largest number of visits.
+2. Find out the number of **unique** hosts in total (i.e. in August 1995)? [Answer: 75060 Unique Hosts]
+3. Find out the most frequent visitor, i.e. the host with the largest number of visits. [Answer: "edams.ksc.nasa.gov]
 
 ### Linear regression for advertising
 
 4. Add regularization to the [linear regression for advertising example](#example-linear-regression-for-advertising) and evaluate the prediction performance against the performance without any regularization. Study at least three different regularization settings.
+[Answer: Adding increasing regularisation parameters (0.1, 0.2, 0.5) increases each of the predictions each time. Not really sure what else to put here without just copy pasting it all in?]
 
 ### Logistic regression for document classification