Skip to content

Commit 3560906

Browse files
author
ercbk
committed
replaced pickles, replaced old n_iter params in retic-raschka, updated readme
1 parent 5b9ea80 commit 3560906

File tree

13 files changed

+28
-109
lines changed

13 files changed

+28
-109
lines changed

README.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Progress (duration in seconds)
4545

4646
References
4747

48-
Boulesteix, AL, and C Strobl. 2009. “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High-Dimensional Prediction.” BMC Medical Research Methodology 9 (1): 85. [link](Boulesteix, AL, and C Strobl. 2009. “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High-Dimensional Prediction.” BMC Medical Research Methodology 9 (1): 85.)
48+
Boulesteix, AL, and C Strobl. 2009. “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High-Dimensional Prediction.” BMC Medical Research Methodology 9 (1): 85. [link](https://www.researchgate.net/publication/40756303_Optimal_classifier_selection_and_negative_bias_in_error_rate_estimation_An_empirical_study_on_high-dimensional_prediction)
4949

5050
Sabastian Raschka, "STAT 479 Statistical Tests and Algorithm Comparison," (Lecture Notes, University of Wisconsin-Madison, Fall 2019). [link](https://github.com/rasbt/stat479-machine-learning-fs19/blob/master/11_eval4-algo/11-eval4-algo__notes.pdf)
5151

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Boulesteix, AL, and C Strobl. 2009. “Optimal Classifier Selection and
6868
Negative Bias in Error Rate Estimation: An Empirical Study on
6969
High-Dimensional Prediction.” BMC Medical Research Methodology 9 (1):
7070
85.
71-
[link](Boulesteix,%20AL,%20and%20C%20Strobl.%202009.%20“Optimal%20Classifier%20Selection%20and%20Negative%20Bias%20in%20Error%20Rate%20Estimation:%20An%20Empirical%20Study%20on%20High-Dimensional%20Prediction.”%20BMC%20Medical%20Research%20Methodology%209%20\(1\):%2085.)
71+
[link](https://www.researchgate.net/publication/40756303_Optimal_classifier_selection_and_negative_bias_in_error_rate_estimation_An_empirical_study_on_high-dimensional_prediction)
7272

7373
Sabastian Raschka, “STAT 479 Statistical Tests and Algorithm
7474
Comparison,” (Lecture Notes, University of Wisconsin-Madison, Fall

data/fivek-simdat.pickle

-29 Bytes
Binary file not shown.
-51.4 KB
Binary file not shown.
46.7 KB
Loading

duration-experiment/raschka/nested-cv-kj-raschka.R

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,14 @@
22

33

44
# Raschka method
5-
# kj
5+
# ranger-kj
66

77

88

99
# Notes
1010
# 1. *** Make sure the target column is last in dataframe ***
1111

1212

13-
# Available Choices
14-
# 1. Data
15-
# 2. Algorithms
16-
# 3. Hyperparameter value grids
17-
# 4. Outer-Loop CV strategy
18-
# 5. Inner-Loop CV strategy
19-
# 6. Tuning strategy
20-
21-
22-
23-
# Experiment
24-
# 4 core, 16GB RAM
25-
# rf, elastic net algorithms with 40x2 and 200x2 latin hypercube grids respectively
26-
# 5000 obs, 10 features, outer-loop = 5 k-fold, inner-loop = 2 k-fold
27-
# 268.96 sec (4.48 min)
28-
# MAE: k-fold error = 1.40926
29-
# test error = 1.3475
30-
# Best parameters for ranger:
31-
# mtry = 4 and trees = 234
32-
33-
3413
# Sections
3514
# 1. Set-Up
3615
# 2. Error function

duration-experiment/raschka/nested-cv-mlr3-raschka.R

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -13,24 +13,6 @@
1313
# 4. The batch arg in the tuner function allows you to specify how you want to parallelize for each algorithm which is nice.
1414

1515

16-
17-
# Choices
18-
# 1. Data
19-
# 2. Algorithms
20-
# 3. Hyperparameter value grids
21-
# 4. Outer-Loop CV strategy
22-
# 5. Inner-Loop CV strategy
23-
# 6. Tuning strategy
24-
25-
26-
27-
# Experiment:
28-
# 4 core, 16GB RAM
29-
# rf, glmnet algorithms with 100x2 hyperparameter grids
30-
# 100 obs, 10 features, repeats = 2, outer loop = 10 folds, inner loop = 25 resamples
31-
# sec ( min)
32-
33-
3416
# Sections:
3517
# 1. Set-Up and Data
3618
# 2. Functions Used in the Loops

duration-experiment/raschka/nested-cv-py-raschka.py

Lines changed: 22 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -18,30 +18,6 @@
1818
# (6. cont.) and I'm just worried about fairly testing the speed of implentations.
1919

2020

21-
# Choices
22-
# 1. Data
23-
# 2. Algorithms
24-
# 3. Hyperparameter value grids
25-
# 4. Outer-Loop CV strategy
26-
# 5. Inner-Loop CV strategy
27-
# 6. Tuning strategy
28-
29-
30-
# Experiment:
31-
# 4 core, 16GB RAM
32-
# rf, elastic net algorithms with 40x2 and 200x2 latin hypercube grids.
33-
# 5000 obs, 10 features, outer loop = 5 folds, inner loop = 2 folds
34-
# 941.49 sec (15.69 min)
35-
36-
# Results for 5000 obs:
37-
# MAE: 2.07222 (Average of K-fold Cv test folds)
38-
# Training Error: 2.06839
39-
# Test Error: 2.09252
40-
# Best parameter for chosen algorithm, Elastic Net:
41-
# Alpha = 1.20342e-10
42-
# L1 ratio = 0.94502
43-
44-
4521
# Sections
4622
# 1. Set-up
4723
# 2. Data
@@ -59,30 +35,34 @@
5935
###################################
6036

6137

62-
# Necessary in order to run in parallel.
63-
# Was told this must be ran before other modules imported.
64-
# Update executable path in sys module.
65-
import sys
66-
import os
67-
exe = os.path.join(sys.exec_prefix, "pythonw.exe")
68-
sys.executable = exe
69-
sys._base_executable = exe
70-
# update executable path in multiprocessing module
71-
import multiprocessing
72-
multiprocessing.set_executable(exe)
38+
# If in RStudio or using reticulate::source_python, necessary in order
39+
# to run in parallel.
40+
# Should be ran before other modules imported.
41+
# Updates executable path in sys module.
42+
# import sys
43+
# import os
44+
# exe = os.path.join(sys.exec_prefix, "pythonw.exe")
45+
# sys.executable = exe
46+
# sys._base_executable = exe
47+
# # update executable path in multiprocessing module
48+
# import multiprocessing
49+
# multiprocessing.set_executable(exe)
7350

7451

75-
import subprocess
76-
import time
77-
subprocess.Popen('mlflow server')
78-
time.sleep(10)
52+
# If in RStudio or using reticulate::source_python, necessary in order
53+
# start MLflow's server
54+
# import subprocess
55+
# import time
56+
# subprocess.Popen('mlflow server')
57+
# time.sleep(10)
7958

8059

8160
from pytictoc import TicToc
8261
t = TicToc()
8362
t.tic()
8463

8564
from pushbullet import Pushbullet
65+
import os
8666
import mlflow
8767
import pickle
8868
import numpy as np
@@ -111,15 +91,15 @@
11191

11292
# load simulated data
11393
# r = read mode, b = binary; pickle is binary
114-
with open('C:/Users/tbats/Documents/R/Projects/nested-cross-validation-comparison/data/fivek-simdat.pickle', 'rb') as fried:
94+
with open('./data/fivek-simdat.pickle', 'rb') as fried:
11595
pdat = pickle.load(fried)
11696

11797
# load penalyzed regression hyperparameter values
118-
with open('C:/Users/tbats/Documents/R/Projects/nested-cross-validation-comparison/grids/elast-latin-params.pickle', 'rb') as elastp:
98+
with open('./grids/elast-latin-params.pickle', 'rb') as elastp:
11999
elast_params = pickle.load(elastp)
120100

121101
# load random forest hyperparater values
122-
with open('C:/Users/tbats/Documents/R/Projects/nested-cross-validation-comparison/grids/rf-latin-params.pickle', 'rb') as rfp:
102+
with open('./grids/rf-latin-params.pickle', 'rb') as rfp:
123103
rf_params = pickle.load(rfp)
124104

125105

0 commit comments

Comments
 (0)