|
4 | 4 | #' `predict()` can be used for all types of models and uses the |
5 | 5 | #' "type" argument for more specificity. |
6 | 6 | #' |
7 | | -#' @param object An object of class `model_fit` |
| 7 | +#' @param object An object of class `model_fit`. |
8 | 8 | #' @param new_data A rectangular data object, such as a data frame. |
9 | 9 | #' @param type A single character value or `NULL`. Possible values |
10 | | -#' are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", "time", |
11 | | -#' "hazard", "survival", or "raw". When `NULL`, `predict()` will choose an |
12 | | -#' appropriate value based on the model's mode. |
| 10 | +#' are `"numeric"`, `"class"`, `"prob"`, `"conf_int"`, `"pred_int"`, |
| 11 | +#' `"quantile"`, `"time"`, `"hazard"`, `"survival"`, or `"raw"`. When `NULL`, |
| 12 | +#' `predict()` will choose an appropriate value based on the model's mode. |
13 | 13 | #' @param opts A list of optional arguments to the underlying |
14 | 14 | #' predict function that will be used when `type = "raw"`. The |
15 | 15 | #' list should not include options for the model object or the |
16 | 16 | #' new data being predicted. |
17 | | -#' @param ... Arguments to the underlying model's prediction |
18 | | -#' function cannot be passed here (see `opts`). There are some |
19 | | -#' `parsnip` related options that can be passed, depending on the |
20 | | -#' value of `type`. Possible arguments are: |
| 17 | +#' @param ... Additional `parsnip`-related options, depending on the |
| 18 | +#' value of `type`. Arguments to the underlying model's prediction |
| 19 | +#' function cannot be passed here (use the `opts` argument instead). |
| 20 | +#' Possible arguments are: |
21 | 21 | #' \itemize{ |
22 | | -#' \item `interval`: for `type`s of "survival" and "quantile", should |
| 22 | +#' \item `interval`: for `type` equal to `"survival"` or `"quantile"`, should |
23 | 23 | #' interval estimates be added, if available? Options are `"none"` |
24 | 24 | #' and `"confidence"`. |
25 | | -#' \item `level`: for `type`s of "conf_int", "pred_int", and "survival" |
| 25 | +#' \item `level`: for `type` equal to `"conf_int"`, `"pred_int"`, or `"survival"`, |
26 | 26 | #' this is the parameter for the tail area of the intervals |
27 | 27 | #' (e.g. confidence level for confidence intervals). |
28 | | -#' Default value is 0.95. |
29 | | -#' \item `std_error`: add the standard error of fit or prediction (on |
30 | | -#' the scale of the linear predictors) for `type`s of "conf_int" |
31 | | -#' and "pred_int". Default value is `FALSE`. |
32 | | -#' \item `quantile`: the quantile(s) for quantile regression |
33 | | -#' (not implemented yet) |
34 | | -#' \item `time`: the time(s) for hazard and survival probability estimates. |
| 28 | +#' Default value is `0.95`. |
| 29 | +#' \item `std_error`: for `type` equal to `"conf_int"` or `"pred_int"`, add |
| 30 | +#' the standard error of fit or prediction (on the scale of the |
| 31 | +#' linear predictors). Default value is `FALSE`. |
| 32 | +#' \item `quantile`: for `type` equal to `quantile`, the quantiles of the |
| 33 | +#' distribution. Default is `(1:9)/10`. |
| 34 | +#' \item `time`: for `type` equal to `"survival"` or `"hazard"`, the |
| 35 | +#' time points at which the survival probability or hazard is estimated. |
35 | 36 | #' } |
36 | | -#' @details If "type" is not supplied to `predict()`, then a choice |
37 | | -#' is made: |
| 37 | +#' @details For `type = NULL`, `predict()` uses |
38 | 38 | #' |
39 | 39 | #' * `type = "numeric"` for regression models, |
40 | 40 | #' * `type = "class"` for classification, and |
41 | 41 | #' * `type = "time"` for censored regression. |
42 | 42 | #' |
43 | | -#' `predict()` is designed to provide a tidy result (see "Value" |
44 | | -#' section below) in a tibble output format. |
45 | | -#' |
46 | 43 | #' ## Interval predictions |
47 | 44 | #' |
48 | 45 | #' When using `type = "conf_int"` and `type = "pred_int"`, the options |
|
58 | 55 | #' have the opposite sign as what the underlying model's `predict()` method |
59 | 56 | #' produces. Set `increasing = FALSE` to suppress this behavior. |
60 | 57 | #' |
61 | | -#' @return With the exception of `type = "raw"`, the results of |
62 | | -#' `predict.model_fit()` will be a tibble as many rows in the output |
63 | | -#' as there are rows in `new_data` and the column names will be |
64 | | -#' predictable. |
| 58 | +#' @return With the exception of `type = "raw"`, the result of |
| 59 | +#' `predict.model_fit()` |
| 60 | +#' |
| 61 | +#' * is a tibble |
| 62 | +#' * has as many rows as there are rows in `new_data` |
| 63 | +#' * has standardized column names, see below: |
| 64 | +#' |
| 65 | +#' For `type = "numeric"`, the tibble has a `.pred` column for a single |
| 66 | +#' outcome and `.pred_Yname` columns for a multivariate outcome. |
65 | 67 | #' |
66 | | -#' For numeric results with a single outcome, the tibble will have |
67 | | -#' a `.pred` column and `.pred_Yname` for multivariate results. |
| 68 | +#' For `type = "class"`, the tibble has a `.pred_class` column. |
68 | 69 | #' |
69 | | -#' For hard class predictions, the column is named `.pred_class` |
70 | | -#' and, when `type = "prob"`, the columns are `.pred_classlevel`. |
| 70 | +#' For `type = "prob"`, the tibble has `.pred_classlevel` columns. |
71 | 71 | #' |
72 | | -#' `type = "conf_int"` and `type = "pred_int"` return tibbles with |
73 | | -#' columns `.pred_lower` and `.pred_upper` with an attribute for |
74 | | -#' the confidence level. In the case where intervals can be |
75 | | -#' produces for class probabilities (or other non-scalar outputs), |
76 | | -#' the columns will be named `.pred_lower_classlevel` and so on. |
| 72 | +#' For `type = "conf_int"` and `type = "pred_int"`, the tibble has |
| 73 | +#' `.pred_lower` and `.pred_upper` columns with an attribute for |
| 74 | +#' the confidence level. In the case where intervals can be |
| 75 | +#' produces for class probabilities (or other non-scalar outputs), |
| 76 | +#' the columns are named `.pred_lower_classlevel` and so on. |
77 | 77 | #' |
78 | | -#' Quantile predictions return a tibble with a column `.pred`, which is |
| 78 | +#' For `type = "quantile"`, the tibble has a `.pred` column, which is |
79 | 79 | #' a list-column. Each list element contains a tibble with columns |
80 | 80 | #' `.pred` and `.quantile` (and perhaps other columns). |
81 | 81 | #' |
82 | | -#' Using `type = "raw"` with `predict.model_fit()` will return |
83 | | -#' the unadulterated results of the prediction function. |
| 82 | +#' For `type = "time"`, the tibble has a `.pred_time` column. |
84 | 83 | #' |
85 | | -#' For censored regression: |
| 84 | +#' For `type = "survival"`, the tibble has a `.pred` column, which is |
| 85 | +#' a list-column. Each list element contains a tibble with columns |
| 86 | +#' `.time` and `.pred_survival` (and perhaps other columns). |
| 87 | +#' |
| 88 | +#' For `type = "hazard"`, the tibble has a `.pred` column, which is |
| 89 | +#' a list-column. Each list element contains a tibble with columns |
| 90 | +#' `.time` and `.pred_hazard` (and perhaps other columns). |
86 | 91 | #' |
87 | | -#' * `type = "time"` produces a column `.pred_time`. |
88 | | -#' * `type = "hazard"` results in a list column `.pred` containing tibbles |
89 | | -#' with a column `.pred_hazard`. |
90 | | -#' * `type = "survival"` results in a list column `.pred` containing tibbles |
91 | | -#' with a `.pred_survival` column. |
| 92 | +#' Using `type = "raw"` with `predict.model_fit()` will return |
| 93 | +#' the unadulterated results of the prediction function. |
92 | 94 | #' |
93 | 95 | #' In the case of Spark-based models, since table columns cannot |
94 | 96 | #' contain dots, the same convention is used except 1) no dots |
|
0 commit comments