You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.Rmd
+32-22Lines changed: 32 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -111,13 +111,14 @@ durations
111
111
Experiment details:
112
112
113
113
* The fastest implementation of each method will be used in running a nested cross-validation with different sizes of data ranging from 100 to 5000 observations and different numbers of repeats of the outer-loop cv strategy.
114
+
* The {mlr3} implementation was the fastest for Raschka's method, but the Ranger-Kuhn-Johnson implementation is close. So I'll be using Ranger-Kuhn-Johnson for both methods.
114
115
* The chosen algorithm and hyperparameters will used to predict on a 100K row simulated dataset and the mean absolute error will be calculated for each combination of repeat, data size, and method.
115
-
* AWS
116
-
*Drake
116
+
*Runtimes began to explode after n = 800 for my 8 vcpu, 16 GB RAM desktop, so I ran this experiment using AWS instances: a r5.2xlarge for the Elastic Net and a r5.24xlarge for Random Forest.
117
+
*I'll be iterating through different numbers of repeats and sample sizes, so I'll be transitioning from imperative scripts to a functional approach. Given the long runtimes and impermanent nature of my internet connection, it would be nice to cache each iteration as it finishes. The [{drake}](https://github.com/ropensci/drake) package is superb on both counts, so I'm using it to orchestrate.
color = "white", position = position_dodge(width = 0.8)) +
153
+
coord_flip() +
154
+
labs(y = "Runtime (hrs)", x = "Repeats",
155
+
title = "Kuhn-Johnson", fill = "Sample Size") +
156
+
theme(title = element_text(family = "Roboto"),
157
+
text = element_text(family = "Roboto"),
158
+
legend.position = "top",
159
+
legend.background = element_rect(fill = "ivory"),
160
+
legend.key = element_rect(fill = "ivory"),
161
+
axis.ticks = element_blank(),
162
+
axis.text.x = element_blank(),
163
+
panel.background = element_rect(fill = "ivory",
164
+
colour = "ivory"),
165
+
plot.background = element_rect(fill = "ivory"),
166
+
panel.border = element_blank(),
167
+
panel.grid.major = element_blank(),
168
+
panel.grid.minor = element_blank()
169
+
)
161
170
162
171
```
163
172
164
173
165
174
166
175
176
+
167
177
References
168
178
169
179
Boulesteix, AL, and C Strobl. 2009. “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High-Dimensional Prediction.” BMC Medical Research Methodology 9 (1): 85. [link](https://www.researchgate.net/publication/40756303_Optimal_classifier_selection_and_negative_bias_in_error_rate_estimation_An_empirical_study_on_high-dimensional_prediction)
0 commit comments