Skip to content

Commit 90817e1

Browse files
committed
Add AI skill for type checkers
1 parent 0c6da00 commit 90817e1

File tree

3 files changed

+281
-0
lines changed

3 files changed

+281
-0
lines changed

.claude/skills

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
skills

R/standalone-types-check.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -335,6 +335,7 @@ check_symbol <- function(
335335
)
336336
}
337337

338+
# Do we need this?
338339
check_arg <- function(
339340
x,
340341
...,

skills/types-check/SKILL.md

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
---
2+
name: types-check
3+
description: Validate function inputs in R using a standalone file of check_* functions. Use when writing exported R functions that need input validation, or reviewing existing validation code.
4+
---
5+
6+
# Input Validation in R Functions
7+
8+
This skill covers patterns for validating function inputs using rlang's standalone file of `check_*` functions or reviewing existing validation code.
9+
10+
## About the Standalone File
11+
12+
The `check_*` functions come from <https://github.com/r-lib/rlang/blob/main/R/standalone-types-check.R>, a standalone file that can be vendored into any R package. This means:
13+
14+
- **Not exported**: These functions are internal utilities, not part of your package's public API, nor rlang's
15+
- **Use usethis to import**: If you see these helpers are missing, run `usethis::use_standalone("r-lib/rlang", "types-check")` to add the file in your package. Call again to update it.
16+
- **Dependency**: Requires a sufficiently new version of rlang in `Imports`. The exact minimal version is inserted automatically by `usethis::use_standalone()`. These checkers are not a good fit for zero-dependencies packages.
17+
18+
## Core Principles
19+
20+
### Error messages
21+
22+
The `check_*` functions produce clear, actionable error messages crafted by rlang:
23+
24+
```r
25+
check_string(123)
26+
#> Error: `123` must be a single string, not the number 123.
27+
28+
check_number_whole(3.14, min = 1, max = 10)
29+
#> Error: `3.14` must be a whole number, not the number 3.14.
30+
```
31+
32+
### Performance
33+
34+
Some checkers are implemented in C for minimal overhead:
35+
- `check_bool()`
36+
- `check_number_whole()`
37+
- `check_number_decimal()`
38+
39+
This makes validation fast enough to use even in performance-sensitive code.
40+
But don't check in a very tight loop unless absolutely needed.
41+
42+
## When to Validate Inputs
43+
44+
**Validate at entry points, not everywhere.**
45+
46+
Input validation should happen at the boundary between user code and your package's internal implementation, typically in exported functions that accept user data. Once inputs are validated at these entry points, internal helper functions can trust the data they receive without checking again.
47+
48+
A good analogy to keep in mind is gradual typing. Think of input validation like TypeScript type guards. Once you've validated data at the boundary, you can treat it as "typed" within your internal functions. Additional runtime checks are not needed. The entry point validates once, and all downstream code benefits.
49+
50+
Exception: Validate when in doubt. Do validate in internal functions if:
51+
- The cost of invalid data is high (data corruption, security issues)
52+
- The function or context is complex and you want defensive checks
53+
54+
### Entry points (validate here)
55+
56+
- **Exported functions**: Functions users call directly
57+
- **Functions accepting user data**: Even internal functions if they directly consume user input, or external data (e.g. unserialised data)
58+
59+
```r
60+
# Exported function: VALIDATE
61+
#' @export
62+
create_report <- function(title, n_rows) {
63+
check_string(title)
64+
check_number_whole(n_rows, min = 1)
65+
66+
# Now call helpers with validated data
67+
data <- generate_data(n_rows)
68+
format_report(title, data)
69+
}
70+
```
71+
72+
### Internal helpers (don't validate)
73+
74+
Once data is validated at the entry point, internal helpers can skip validation:
75+
76+
```r
77+
# Internal helper: NO VALIDATION NEEDED
78+
generate_data <- function(n_rows) {
79+
# n_rows is already validated, just use it
80+
data.frame(
81+
id = seq_len(n_rows),
82+
value = rnorm(n_rows)
83+
)
84+
}
85+
86+
# Internal helper: NO VALIDATION NEEDED
87+
format_report <- function(title, data) {
88+
# title and data are already validated, just use them
89+
list(
90+
title = title,
91+
summary = summary(data),
92+
rows = nrow(data)
93+
)
94+
}
95+
```
96+
97+
Note how the `data` generated by `generate_data()` doesn't need validation either. Internal code creating data in a trusted way (e.g. because it's simple or because it's covered by unit tests) doesn't require internal checks.
98+
99+
## Early input checking
100+
101+
Always validate inputs at the start of user-facing functions, before doing any work:
102+
103+
```r
104+
my_function <- function(x, name, env = caller_env()) {
105+
check_logical(x)
106+
check_name(name)
107+
check_environment(env)
108+
109+
# ... function body
110+
}
111+
```
112+
113+
Benefits:
114+
115+
- This self-documents the types of the arguments
116+
- Eager evaluation also reduces the risk of confusing lazy evaluation effects
117+
118+
## Choosing the Right Checker
119+
120+
### Scalars (single values)
121+
122+
For atomic vectors, use scalar checkers when arguments parameterise the function (configuration flags, names, single counts), rather than represent vectors of user data. They assert a single value.
123+
124+
- `check_bool()`: Single TRUE/FALSE (use for flags/options)
125+
- `check_string()`: Single string (allows empty `""` by default)
126+
- `check_name()`: Single non-empty string (for variable names, symbols as strings)
127+
- `check_number_whole()`: Single integer-like numeric value
128+
- `check_number_decimal()`: Single numeric value (allows decimals)
129+
130+
By default, scalar checkers do _not_ allow `NA` elements (`allow_na = FALSE`). Set `allow_na = TRUE` when missing values are allowed.
131+
132+
Scalar checkers also include checks for non-vector inputs:
133+
134+
- `check_symbol()`: A symbol object
135+
- `check_call()`: A defused call expression
136+
- `check_environment()`: An environment object
137+
- `check_function()`: Any function (closure, primitive, or special)
138+
- `check_closure()`: An R function specifically (not primitive/special)
139+
- `check_formula()`: A formula object
140+
141+
### Vectors
142+
143+
- `check_logical()`: Logical vector of any length
144+
- `check_character()`: Character vector of any length
145+
- `check_data_frame()`: A data frame object
146+
147+
By default, vector checkers allow `NA` elements (`allow_na = TRUE`). Set `allow_na = FALSE` when missing values are not allowed.
148+
149+
## Optional values: `allow_null`
150+
151+
Use `allow_null = TRUE` when `NULL` represents a valid "no value" state, similar to `Option<T>` in Rust or `T | null` in TypeScript:
152+
153+
```r
154+
# NULL means "use default timeout"
155+
check_number_decimal(timeout, allow_null = TRUE)
156+
```
157+
158+
The tidyverse style guide recommends using `NULL` defaults instead of `missing()` defaults, so this pattern comes up often in practice.
159+
160+
## Missing values: `allow_na`
161+
162+
Use `allow_na = TRUE` when `NA` is semantically meaningful for your use case:
163+
164+
- **For scalars**: Allows the value itself to be `NA`
165+
- **For vectors**: Allows the vector to contain `NA` elements (this is the default for vector checkers like `check_character()` and `check_logical()`)
166+
167+
```r
168+
# NA means "unknown" - semantically valid
169+
check_string(name, allow_na = TRUE)
170+
171+
# Don't allow missing values in required data
172+
check_character(ids, allow_na = FALSE)
173+
```
174+
175+
## Bounds Checking for Numbers
176+
177+
Use `min` and `max` arguments for range validation:
178+
179+
```r
180+
check_number_whole(
181+
n,
182+
min = 1,
183+
max = 100
184+
)
185+
```
186+
187+
Additional number options:
188+
- `allow_infinite = TRUE`: Allow Inf/-Inf (default `TRUE` for decimals, `FALSE` for whole numbers)
189+
190+
## When to Use `arg` and `call` Parameters
191+
192+
Understanding when to pass `arg` and `call` is critical for correct error reporting.
193+
194+
### Entry point functions: DON'T pass `arg`/`call`
195+
196+
When validating inputs directly in an entry point function (typically exported functions), **do not** pass `arg` and `call` parameters. The default parameters `caller_arg(x)` and `caller_env()` will automatically pick up the correct argument name and calling environment.
197+
198+
```r
199+
# Entry point: let defaults work
200+
#' @export
201+
my_function <- function(x, name) {
202+
check_string(x) # Correct! Defaults capture user's context
203+
check_name(name) # Correct!
204+
205+
# ... function body
206+
}
207+
```
208+
209+
### Check wrapper functions: DO pass `arg`/`call`
210+
211+
When creating a wrapper or helper function that calls `check_*` functions on behalf of another function, you **must** propagate the caller context. Otherwise, errors will point to your wrapper function instead of the actual entry point.
212+
213+
Without proper propagation, error messages show the wrong function and argument names:
214+
215+
```r
216+
# WRONG: errors will point to check_positive's definition
217+
check_positive <- function(x) {
218+
check_number_whole(x, min = 1)
219+
}
220+
221+
my_function <- function(count) {
222+
check_positive(count)
223+
}
224+
225+
my_function(-5)
226+
#> Error in `check_positive()`: # Wrong! Should say `my_function()`
227+
#> ! `x` must be a whole number larger than or equal to 1. # Wrong! Should say `count`
228+
```
229+
230+
With proper propagation, errors correctly identify the entry point and argument:
231+
232+
```r
233+
# CORRECT: propagates context from the entry point
234+
check_positive <- function(x, arg = caller_arg(x), call = caller_env()) {
235+
check_number_whole(x, min = 1, arg = arg, call = call)
236+
}
237+
238+
my_function <- function(count) {
239+
check_positive(count)
240+
}
241+
242+
my_function(-5)
243+
#> Error in `my_function()`: # Correct!
244+
#> ! `count` must be a whole number larger than or equal to 1. # Correct!
245+
```
246+
247+
## Other useful checkers
248+
249+
Exported from other packages:
250+
251+
- `rlang::arg_match()`: Validates enumerated choices. Partial matching is an error unlike `base::match.arg()`. Use when an argument must be one of a known set of strings.
252+
253+
```r
254+
# Validates and returns the matched value
255+
my_plot <- function(color = c("red", "green", "blue")) {
256+
color <- rlang::arg_match(color)
257+
# ...
258+
}
259+
260+
my_plot("redd")
261+
#> Error in `my_plot()`:
262+
#> ! `color` must be one of "red", "green", or "blue", not "redd".
263+
#> ℹ Did you mean "red"?
264+
```
265+
266+
- `rlang::check_required()`: Nice error message if required argument is not supplied.
267+
268+
- `vctrs::obj_check_list()` checks that input is considered a list in the vctrs sense:
269+
- A bare list with no class
270+
- A list explicitly inheriting from `"list"`.
271+
272+
- `vctrs::obj_check_vector()` checks that input is considered a vector in the vctrs sense:
273+
- A base atomic type
274+
- A list in the vctrs sense
275+
- An object with a `vec_proxy()` method
276+
277+
- `vctrs::vec_check_size(x, size)` tests if vector `x` has size `size`, and throws an informative error if it doesn't.
278+
279+
- `vctrs::list_check_all_vectors()` and `vctrs::list_check_all_size()` check that inputs are lists containing only vectors, or only vectors of a given size.

0 commit comments

Comments
 (0)