Skip to content

Commit 6fd3bf9

Browse files
author
ercbk
committed
readme edit
1 parent 1ae4724 commit 6fd3bf9

File tree

2 files changed

+11
-10
lines changed

2 files changed

+11
-10
lines changed

README.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ output: github_document
44

55
# Nested Cross-Validation: Comparing Methods and Implementations
66

7-
Nested cross-validation has become a recommended technique for situations in which the size of our dataset is insufficient to handle both hyperparameter tuning and algorithm comparison. Using standard methods such as k-fold cross-validation in such situations results in significant increases in optimization bias. Nested cross-validation has been shown to produce low bias in out-of-sample error estimates even using datasets with only a few hundred rows.
7+
Nested cross-validation has become a recommended technique for situations in which the size of our dataset is insufficient to simultaneously handle hyperparameter tuning and algorithm comparison. Using standard methods such as k-fold cross-validation in such situations results in significant increases in optimization bias. Nested cross-validation has been shown to produce low bias, out-of-sample error estimates even using datasets with only a few hundred rows and therefore gives a better judgemnet of generalization performance.
88

99
The primary issue with this technique is that it is computationally very expensive with potentially tens of 1000s of models being trained during the process. While researching this technique, I found two methods of performing nested cross-validation — one authored by [Sabastian Raschka](https://github.com/rasbt/stat479-machine-learning-fs19/blob/master/11_eval4-algo/code/11-eval4-algo__nested-cv_verbose1.ipynb) and the other by [Max Kuhn and Kjell Johnson](https://tidymodels.github.io/rsample/articles/Applications/Nested_Resampling.html).
1010
This experiment seeks to answer two questions:
1111

1212
1. What's the fastest implementation of each method?
13-
2. How many *repeats*, given the size of the training set, should we expect to need to obtain a reasonably accurate out-of-sample error estimate?
13+
2. How many repeats, given the size of this dataset, should we expect to need to obtain a reasonably accurate out-of-sample error estimate?
1414

1515
With regards to the question of speed, I'll will be testing implementations of both methods from various packages which include {tune}, {mlr3}, {h2o}, and {sklearn}.
1616

README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,13 @@
22
# Nested Cross-Validation: Comparing Methods and Implementations
33

44
Nested cross-validation has become a recommended technique for
5-
situations in which the size of our dataset is insufficient to handle
6-
both hyperparameter tuning and algorithm comparison. Using standard
7-
methods such as k-fold cross-validation in such situations results in
8-
significant increases in optimization bias. Nested cross-validation has
9-
been shown to produce low bias in out-of-sample error estimates even
10-
using datasets with only a few hundred rows.
5+
situations in which the size of our dataset is insufficient to
6+
simultaneously handle hyperparameter tuning and algorithm comparison.
7+
Using standard methods such as k-fold cross-validation in such
8+
situations results in significant increases in optimization bias. Nested
9+
cross-validation has been shown to produce low bias, out-of-sample error
10+
estimates even using datasets with only a few hundred rows and therefore
11+
gives a better judgemnet of generalization performance.
1112

1213
The primary issue with this technique is that it is computationally very
1314
expensive with potentially tens of 1000s of models being trained during
@@ -19,8 +20,8 @@ Johnson](https://tidymodels.github.io/rsample/articles/Applications/Nested_Resam
1920
This experiment seeks to answer two questions:
2021

2122
1. What’s the fastest implementation of each method?
22-
2. How many *repeats*, given the size of the training set, should we
23-
expect to need to obtain a reasonably accurate out-of-sample error
23+
2. How many repeats, given the size of this dataset, should we expect
24+
to need to obtain a reasonably accurate out-of-sample error
2425
estimate?
2526

2627
With regards to the question of speed, I’ll will be testing

0 commit comments

Comments
 (0)