Skip to content

Commit e4c2a38

Browse files
committed
Update filter.md
1 parent c6f4bb1 commit e4c2a38

File tree

1 file changed

+37
-3
lines changed

1 file changed

+37
-3
lines changed

docs/src/man/filter.md

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,12 @@ Naturally, other `fun`s supported by `byrow` which return a `Vector{Bool}` or `B
1818

1919
The `filter` and `filter!` functions are two shortcuts which wrap the `byrow` and `getindex`/`deleteat!` operations in a function.
2020

21-
`filter(ds, cols; [view = false, type = all,...])` is the shortcut for `ds[byrow(ds, type, cols; ...), :]`, and `filter!(ds, cols; [type = all, ...])` is the shortcut for `deleteat![ds, .!byrow(ds, type, cols; ...))`.
21+
`filter(ds, cols; [missings = missing, view = false, type = all,...])` is the shortcut for `ds[byrow(ds, type, cols; ...), :]`, and `filter!(ds, cols; [missings = missing, type = all, ...])` is the shortcut for `deleteat![ds, .!byrow(ds, type, cols; ...))`.
22+
23+
The `missings` keyword argument can be used to control how the missing values should be treated, e.g. setting `missings = true` means that the function treats missings values as `true`.
2224

2325
> Note, by default `type` is set to `all`.
24-
> Users can use `delete` and `delete!` as shortcuts for `ds[.!byrow(ds, type, cols; ...), :]` and `deleteat![ds, byrow(ds, type, cols; ...))`, respectively.
26+
> Users can use `delete` and `delete!` as shortcuts for `ds[.!byrow(ds, type, cols; ...), :]` and `deleteat![ds, byrow(ds, type, cols; ...))`, respectively. The `delete` and `delete!` functions also support the `missings` keyword argument.
2527
2628
### Examples
2729

@@ -111,6 +113,38 @@ julia> byrow(ds, all, 2:3, by = [>(5), isodd])
111113
0
112114
```
113115

116+
In the next example we pass the `missings` keyword argument:
117+
118+
```jldoctest
119+
julia> ds = Dataset(x = [2, 4, 6, missing], y = [1, 2, 3, 4])
120+
4×2 Dataset
121+
Row │ x y
122+
│ identity identity
123+
│ Int64? Int64?
124+
─────┼────────────────────
125+
1 │ 2 1
126+
2 │ 4 2
127+
3 │ 6 3
128+
4 │ missing 4
129+
130+
julia> filter(ds, [:x, :y], by = iseven, missings = false)
131+
1×2 Dataset
132+
Row │ x y
133+
│ identity identity
134+
│ Int64? Int64?
135+
─────┼────────────────────
136+
1 │ 4 2
137+
138+
julia> filter(ds, [:x, :y], by = iseven, missings = true)
139+
2×2 Dataset
140+
Row │ x y
141+
│ identity identity
142+
│ Int64? Int64?
143+
─────┼────────────────────
144+
1 │ 4 2
145+
2 │ missing 4
146+
```
147+
114148
We can use the combination of `modify!/modify` and `byrow` to filter observations based on all values in a column, e.g. in the following example we filter all rows which `:x2` and `:x3` are larger than their means:
115149

116150
```jldoctest
@@ -179,7 +213,7 @@ julia> filter(ds, :, type = isequal)
179213
180214
however, unlike `map`, the function doesn't return the whole modified dataset, it returns a boolean data set with the same number of rows as `ds` and the same number of columns as the length of `cols`, while `fun` has been called on each observation. The return value of `fun` must be `true`, `false`, or `missing`. The combination of `mask` and `byrow` can be used to filter observations.
181215

182-
Compared to `byrow`, the `mask` function has some useful features which are handy in some scenarios:
216+
Compared to `filter/!` (`delete/!`), the `mask` function has the following default behaviour:
183217

184218
* `mask` returns a boolean data set which shows exactly which observation will be selected when `fun` is called on it.
185219
* By default, the `mask` function filters observations based on their formatted values. And to change this we should pass `mapformats = false`.

0 commit comments

Comments
 (0)