Skip to content

Commit 1a8bf0d

Browse files
author
ercbk
committed
readme edits
1 parent 33ee963 commit 1a8bf0d

File tree

3 files changed

+17
-15
lines changed

3 files changed

+17
-15
lines changed

README.Rmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,10 +111,10 @@ durations
111111
Experiment details:
112112

113113
* The fastest implementation of each method will be used in running a nested cross-validation with different sizes of data ranging from 100 to 5000 observations and different numbers of repeats of the outer-loop cv strategy.
114-
* The {mlr3} implementation was the fastest for Raschka's method, but the Ranger-Kuhn-Johnson implementation is close. So I'll be using Ranger-Kuhn-Johnson for both methods.
114+
* The {mlr3} implementation was the fastest for Raschka's method, but the Ranger-Kuhn-Johnson implementation was close. To simplify, I'll be using Ranger-Kuhn-Johnson for both methods.
115115
* The chosen algorithm and hyperparameters will used to predict on a 100K row simulated dataset and the mean absolute error will be calculated for each combination of repeat, data size, and method.
116-
* Runtimes began to explode after n = 800 for my 8 vcpu, 16 GB RAM desktop, so I ran this experiment using AWS instances: a r5.2xlarge for the Elastic Net and a r5.24xlarge for Random Forest.
117-
* I'll be iterating through different numbers of repeats and sample sizes, so I'll be transitioning from imperative scripts to a functional approach. Given the long runtimes and impermanent nature of my internet connection, it would be nice to cache each iteration as it finishes. The [{drake}](https://github.com/ropensci/drake) package is superb on both counts, so I'm using it to orchestrate.
116+
* Runtimes began to explode after n = 800 for my 8 vcpu, 16 GB RAM desktop, therefore I ran this experiment using AWS instances: a r5.2xlarge for the Elastic Net and a r5.24xlarge for Random Forest.
117+
* I'll be transitioning from imperative scripts to a functional approach, because I'm iterating through different numbers of repeats and sample sizes. Given the long runtimes and impermanent nature of my internet connection, it would also be nice to cache each iteration as it finishes. The [{drake}](https://github.com/ropensci/drake) package is superb on both counts, so I'm using it to orchestrate.
118118

119119
```{r perf_build_times, echo=FALSE, message=FALSE, cache=TRUE}
120120
@@ -146,10 +146,10 @@ fill_colors <- unname(swatches::read_ase("palettes/Forest Floor.ase"))
146146
147147
ggplot(subtargets, aes(y = elapsed, x = repeats,
148148
fill = n, label = elapsed)) +
149-
geom_col(position = position_dodge(width = 0.8)) +
149+
geom_col(position = position_dodge(width = 0.85)) +
150150
scale_fill_manual(values = fill_colors[4:7]) +
151151
geom_text(hjust = 1.3, size = 3.5,
152-
color = "white", position = position_dodge(width = 0.8)) +
152+
color = "white", position = position_dodge(width = 0.85)) +
153153
coord_flip() +
154154
labs(y = "Runtime (hrs)", x = "Repeats",
155155
title = "Kuhn-Johnson", fill = "Sample Size") +

README.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -91,20 +91,22 @@ Experiment details:
9191
100 to 5000 observations and different numbers of repeats of the
9292
outer-loop cv strategy.
9393
- The {mlr3} implementation was the fastest for Raschka’s method,
94-
but the Ranger-Kuhn-Johnson implementation is close. So I’ll be
95-
using Ranger-Kuhn-Johnson for both methods.
94+
but the Ranger-Kuhn-Johnson implementation was close. To
95+
simplify, I’ll be using Ranger-Kuhn-Johnson for both methods.
9696
- The chosen algorithm and hyperparameters will used to predict on a
9797
100K row simulated dataset and the mean absolute error will be
9898
calculated for each combination of repeat, data size, and method.
9999
- Runtimes began to explode after n = 800 for my 8 vcpu, 16 GB RAM
100-
desktop, so I ran this experiment using AWS instances: a r5.2xlarge
101-
for the Elastic Net and a r5.24xlarge for Random Forest.
102-
- I’ll be iterating through different numbers of repeats and sample
103-
sizes, so I’ll be transitioning from imperative scripts to a
104-
functional approach. Given the long runtimes and impermanent nature
105-
of my internet connection, it would be nice to cache each iteration
106-
as it finishes. The [{drake}](https://github.com/ropensci/drake)
107-
package is superb on both counts, so I’m using it to orchestrate.
100+
desktop, therefore I ran this experiment using AWS instances: a
101+
r5.2xlarge for the Elastic Net and a r5.24xlarge for Random
102+
Forest.
103+
- I’ll be transitioning from imperative scripts to a functional
104+
approach, because I’m iterating through different numbers of repeats
105+
and sample sizes. Given the long runtimes and impermanent nature of
106+
my internet connection, it would also be nice to cache each
107+
iteration as it finishes. The
108+
[{drake}](https://github.com/ropensci/drake) package is superb on
109+
both counts, so I’m using it to orchestrate.
108110

109111
![](README_files/figure-gfm/perf_bt_charts-1.png)<!-- -->
110112

10 Bytes
Loading

0 commit comments

Comments
 (0)