Skip to content

Commit ea09360

Browse files
author
ercbk
committed
misc minor corrections
1 parent 548a8bc commit ea09360

16 files changed

+214
-23
lines changed

README.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,13 +35,13 @@ Various elements of the technique can be altered to improve performance. These i
3535
3. Inner-Loop CV strategy
3636
4. Grid search strategy
3737

38-
For the performance experiemnt (question 2), I'll be varying the repeats of the outer-loop cv strategy for each method. The fastest implementation of each method will be tuned with different sizes of data ranging from 100 to 5000 observations. The mean absolute error will be calculated for each combination of repeat, data size, and method.
38+
For the performance experiment (question 2), I'll be varying the repeats of the outer-loop cv strategy for each method. The fastest implementation of each method will be tuned with different sizes of data ranging from 100 to 5000 observations. The mean absolute error will be calculated for each combination of repeat, data size, and method.
3939

4040
I'm using a 4 core, 16 GB RAM machine.
4141

4242
Progress (duration in seconds)
4343

44-
![](duration-experiment/outputs/0224-results.png)
44+
![](duration-experiment/outputs/0225-results.png)
4545

4646
References
4747

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ These include:
5050
3\. Inner-Loop CV strategy
5151
4\. Grid search strategy
5252

53-
For the performance experiemnt (question 2), I’ll be varying the repeats
53+
For the performance experiment (question 2), I’ll be varying the repeats
5454
of the outer-loop cv strategy for each method. The fastest
5555
implementation of each method will be tuned with different sizes of data
5656
ranging from 100 to 5000 observations. The mean absolute error will be
@@ -60,7 +60,7 @@ I’m using a 4 core, 16 GB RAM machine.
6060

6161
Progress (duration in seconds)
6262

63-
![](duration-experiment/outputs/0224-results.png)
63+
![](duration-experiment/outputs/0225-results.png)
6464

6565
References
6666

duration-experiment/kuhn-johnson/nested-cv-h2o-kj.R

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ ncv_dat_10 <- rsample::nested_cv(small_dat,
7070
inside = bootstraps(times = 25))
7171

7272

73-
73+
# Start h2o cluster
7474
h2o.init()
7575

7676

@@ -86,14 +86,15 @@ error_FUN <- function(model){
8686
}
8787

8888

89+
# Distributed Random Forest
8990

9091
rf_FUN <- function(x, y, anal_h2o, ass_h2o, params) {
9192

9293
mtries <- params$mtries[[1]]
9394
ntrees <- params$ntrees[[1]]
9495

9596
# h20 ususally needs unique ids or loops will return exact same values over and over
96-
modelId <- as.character(Sys.time())
97+
gridId <- as.character(dqrng::dqrnorm(1))
9798

9899
h2o.show_progress()
99100

@@ -107,6 +108,8 @@ rf_FUN <- function(x, y, anal_h2o, ass_h2o, params) {
107108
}
108109

109110

111+
# Elastic Net Regression
112+
110113
glm_FUN <- function(x, y, anal_h2o, ass_h2o, params) {
111114

112115
alpha <- params$alpha[[1]]
@@ -154,6 +157,7 @@ params_list <- list(glm = list(alpha = c(0, 0.25, 0.5, 0.75, 1),
154157
#####################################################
155158

156159

160+
# inputs params, model, and resample, calls model and error functions, outputs error
157161
mod_error <- function(params, mod_FUN, dat) {
158162
anal_df <- rsample::analysis(dat)
159163
ass_df <- rsample::assessment(dat)
@@ -297,4 +301,5 @@ tic.clearlog()
297301
# MLflow uses waitress for Windows. Killing it also kills mlflow.exe, python.exe, console window host processes
298302
installr::kill_process(process = c("waitress-serve.exe"))
299303

300-
304+
# shutdown cluster
305+
h2o.shutdown(prompt = FALSE)

duration-experiment/kuhn-johnson/nested-cv-parsnip-kj.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ pars_ranger_FUN <- function(params, analysis_set) {
101101

102102

103103

104-
# Regularized Regression
104+
# Elastic Net Regression
105105

106106
glm_FUN <- function(params, analysis_set) {
107107
alpha <- params$mixture[[1]]

duration-experiment/kuhn-johnson/nested-cv-ranger-kj.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ ranger_FUN <- function(params, analysis_set) {
102102
}
103103

104104

105-
# Regularized Regression
105+
# Elastic Net Regression
106106

107107
glm_FUN <- function(params, analysis_set) {
108108
alpha <- params$mixture[[1]]

duration-experiment/kuhn-johnson/nested-cv-sklearn-kj.R

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,8 @@ error_FUN <- function(y_obs, y_hat){
9292
#####################################
9393

9494

95+
# Random Forest
96+
9597
sklearn_rf_FUN <- function(params, analysis_set) {
9698
sklearn_e <- import("sklearn.ensemble")
9799
max_features <- r_to_py(params$mtry[[1]])
@@ -112,7 +114,7 @@ sklearn_rf_FUN <- function(params, analysis_set) {
112114
}
113115

114116

115-
# Regularized Regression
117+
# Elastic Net Regression
116118

117119
glm_FUN <- function(params, analysis_set) {
118120
alpha <- params$mixture[[1]]

duration-experiment/kuhn-johnson/nested-cv-tune-kj.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,7 @@ params_list <- list(glm = glm_params, rf = rf_params)
157157
################################
158158

159159

160+
# inputs params, model, and resample, calls model and error functions, outputs error
160161
mod_error <- function(params, mod_FUN, dat) {
161162
y_col <- ncol(dat$data)
162163
y_obs <- assessment(dat)[y_col]
-47.8 KB
Binary file not shown.
-39.4 KB
Binary file not shown.
-46.7 KB
Binary file not shown.

0 commit comments

Comments
 (0)