Skip to content

Commit 93ca436

Browse files
authored
docs and tunable_boost_tree adjustments re: lightgbm bagging (#768)
* docs and `tunable_boost_tree` adjustments re: lightgbm bagging * re`document` with new lightgbm bagging docs
1 parent 4652db8 commit 93ca436

File tree

4 files changed

+38
-3
lines changed

4 files changed

+38
-3
lines changed

R/tunable.R

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -222,14 +222,15 @@ tunable_boost_tree <- function(x, ...) {
222222
list(list(pkg = "dials", fun = "sample_prop"))
223223
res$call_info[res$name == "learn_rate"] <-
224224
list(list(pkg = "dials", fun = "learn_rate", range = c(-3, -1/2)))
225-
} else {
226-
if (x$engine == "C5.0") {
225+
} else if (x$engine == "C5.0") {
227226
res <- add_engine_parameters(res, c5_boost_engine_args)
228227
res$call_info[res$name == "trees"] <-
229228
list(list(pkg = "dials", fun = "trees", range = c(1, 100)))
230229
res$call_info[res$name == "sample_size"] <-
231230
list(list(pkg = "dials", fun = "sample_prop"))
232-
}
231+
} else if (x$engine == "lightgbm") {
232+
res$call_info[res$name == "sample_size"] <-
233+
list(list(pkg = "dials", fun = "sample_prop"))
233234
}
234235
res
235236
}

man/details_boost_tree_lightgbm.Rd

Lines changed: 22 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/rmd/boost_tree_lightgbm.Rmd

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,12 @@ Non-numeric predictors (i.e., factors) are internally converted to numeric. In t
7474
```{r child = "template-mtry-prop.Rmd"}
7575
```
7676

77+
### Bagging
78+
79+
The `sample_size` argument is translated to the `bagging_fraction` parameter in the `param` argument of `lgb.train`. The argument is interpreted by lightgbm as a _proportion_ rather than a count, so bonsai internally reparameterizes the `sample_size` argument with [dials::sample_prop()] during tuning.
80+
81+
To effectively enable bagging, the user would also need to set the `bagging_freq` argument to lightgbm. `bagging_freq` defaults to 0, which means bagging is disabled, and a `bagging_freq` argument of `k` means that the booster will perform bagging at every `k`th boosting iteration. Thus, by default, the `sample_size` argument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument to `bagging_freq` and use `k = 1` when the analogue to `bagging_fraction` is in $(0, 1)$. _bonsai will thus automatically set_ `bagging_freq = 1` _in_ `set_engine("lightgbm", ...)` if `sample_size` (i.e. `bagging_fraction`) is not equal to 1 and no `bagging_freq` value is supplied. This default can be overridden by setting the `bagging_freq` argument to `set_engine()` manually.
82+
7783
### Verbosity
7884

7985
bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`.

man/rmd/boost_tree_lightgbm.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,12 @@ parsnip and its extensions accommodate this parameterization using the `counts`
121121

122122
`mtry` is a main model argument for \\code{\\link[=boost_tree]{boost_tree()}} and \\code{\\link[=rand_forest]{rand_forest()}}, and thus should not have an engine-specific interface. So, regardless of engine, `counts` defaults to `TRUE`. For engines that support the proportion interpretation---currently `"xgboost"`, `"xrf"` (via the rules package), and `"lightgbm"` (via the bonsai package)---the user can pass the `counts = FALSE` argument to `set_engine()` to supply `mtry` values within $[0, 1]$.
123123

124+
### Bagging
125+
126+
The `sample_size` argument is translated to the `bagging_fraction` parameter in the `param` argument of `lgb.train`. The argument is interpreted by lightgbm as a _proportion_ rather than a count, so bonsai internally reparameterizes the `sample_size` argument with [dials::sample_prop()] during tuning.
127+
128+
To effectively enable bagging, the user would also need to set the `bagging_freq` argument to lightgbm. `bagging_freq` defaults to 0, which means bagging is disabled, and a `bagging_freq` argument of `k` means that the booster will perform bagging at every `k`th boosting iteration. Thus, by default, the `sample_size` argument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument to `bagging_freq` and use `k = 1` when the analogue to `bagging_fraction` is in $(0, 1)$. _bonsai will thus automatically set_ `bagging_freq = 1` _in_ `set_engine("lightgbm", ...)` if `sample_size` (i.e. `bagging_fraction`) is not equal to 1 and no `bagging_freq` value is supplied. This default can be overridden by setting the `bagging_freq` argument to `set_engine()` manually.
129+
124130
### Verbosity
125131

126132
bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`.

0 commit comments

Comments
 (0)