Skip to content

Commit e6a884f

Browse files
author
ercbk
committed
readme edits
1 parent 1a8bf0d commit e6a884f

File tree

3 files changed

+7
-13
lines changed

3 files changed

+7
-13
lines changed

README.Rmd

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,8 @@ Nested cross-validation has become a recommended technique for situations in whi
1010
The primary issue with this technique is that it is computationally very expensive with potentially tens of 1000s of models being trained during the process. While researching this technique, I found two slightly different methods of performing nested cross-validation — one authored by [Sabastian Raschka](https://github.com/rasbt/stat479-machine-learning-fs19/blob/master/11_eval4-algo/code/11-eval4-algo__nested-cv_verbose1.ipynb) and the other by [Max Kuhn and Kjell Johnson](https://tidymodels.github.io/rsample/articles/Applications/Nested_Resampling.html).
1111
I'll be examining two aspects of nested cross-validation:
1212

13-
1. Duration: Which packages and functions give us the fastest implementation of each method?
14-
2. Performance: First, develop a testing framework. Then, using a generated dataset, find how many repeats, given the number of samples, should we expect to need in order to obtain a reasonably accurate out-of-sample error estimate.
15-
16-
With regards to the question of speed, I'll will be testing implementations of both methods from various packages which include {tune}, {mlr3}, {h2o}, and {sklearn}.
13+
1. Duration: Find out which packages and combinations of model functions give us the fastest implementation of each method.
14+
2. Performance: First, develop a testing framework. Then, using a generated dataset, calculate how many repeats, given the sample size, should we expect to need in order to obtain a reasonably accurate out-of-sample error estimate.
1715

1816

1917
## Duration Experiment
@@ -81,7 +79,7 @@ kj <- runs %>%
8179
geom_bar(aes(color = after_scale(prismatic::clr_darken(rep("#BD9865",5), 0.3))), stat = "identity", width = 0.50, fill = "#BD9865") +
8280
coord_flip() +
8381
scale_x_reordered() +
84-
geom_text(hjust = 1.3, size = 3.5, color = "white") +
82+
geom_text(hjust = 1.3, size = 3.5, color = "black") +
8583
labs(x = NULL, y = NULL,
8684
title = "Kuhn-Johnson") +
8785
theme(plot.title = element_text(size = rel(0.9)))

README.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,17 +24,13 @@ and the other by [Max Kuhn and Kjell
2424
Johnson](https://tidymodels.github.io/rsample/articles/Applications/Nested_Resampling.html).
2525
I’ll be examining two aspects of nested cross-validation:
2626

27-
1. Duration: Which packages and functions give us the fastest
28-
implementation of each method?
27+
1. Duration: Find out which packages and combinations of model
28+
functions give us the fastest implementation of each method.
2929
2. Performance: First, develop a testing framework. Then, using a
30-
generated dataset, find how many repeats, given the number of
31-
samples, should we expect to need in order to obtain a reasonably
30+
generated dataset, calculate how many repeats, given the sample
31+
size, should we expect to need in order to obtain a reasonably
3232
accurate out-of-sample error estimate.
3333

34-
With regards to the question of speed, I’ll will be testing
35-
implementations of both methods from various packages which include
36-
{tune}, {mlr3}, {h2o}, and {sklearn}.
37-
3834
## Duration Experiment
3935

4036
Experiment details:
-6 Bytes
Loading

0 commit comments

Comments
 (0)