You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/man/filter.md
+37-3Lines changed: 37 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,10 +18,12 @@ Naturally, other `fun`s supported by `byrow` which return a `Vector{Bool}` or `B
18
18
19
19
The `filter` and `filter!` functions are two shortcuts which wrap the `byrow` and `getindex`/`deleteat!` operations in a function.
20
20
21
-
`filter(ds, cols; [view = false, type = all,...])` is the shortcut for `ds[byrow(ds, type, cols; ...), :]`, and `filter!(ds, cols; [type = all, ...])` is the shortcut for `deleteat![ds, .!byrow(ds, type, cols; ...))`.
21
+
`filter(ds, cols; [missings = missing, view = false, type = all,...])` is the shortcut for `ds[byrow(ds, type, cols; ...), :]`, and `filter!(ds, cols; [missings = missing, type = all, ...])` is the shortcut for `deleteat![ds, .!byrow(ds, type, cols; ...))`.
22
+
23
+
The `missings` keyword argument can be used to control how the missing values should be treated, e.g. setting `missings = true` means that the function treats missings values as `true`.
22
24
23
25
> Note, by default `type` is set to `all`.
24
-
> Users can use `delete` and `delete!` as shortcuts for `ds[.!byrow(ds, type, cols; ...), :]` and `deleteat![ds, byrow(ds, type, cols; ...))`, respectively.
26
+
> Users can use `delete` and `delete!` as shortcuts for `ds[.!byrow(ds, type, cols; ...), :]` and `deleteat![ds, byrow(ds, type, cols; ...))`, respectively. The `delete` and `delete!` functions also support the `missings` keyword argument.
25
27
26
28
### Examples
27
29
@@ -111,6 +113,38 @@ julia> byrow(ds, all, 2:3, by = [>(5), isodd])
111
113
0
112
114
```
113
115
116
+
In the next example we pass the `missings` keyword argument:
julia> filter(ds, [:x, :y], by = iseven, missings = false)
131
+
1×2 Dataset
132
+
Row │ x y
133
+
│ identity identity
134
+
│ Int64? Int64?
135
+
─────┼────────────────────
136
+
1 │ 4 2
137
+
138
+
julia> filter(ds, [:x, :y], by = iseven, missings = true)
139
+
2×2 Dataset
140
+
Row │ x y
141
+
│ identity identity
142
+
│ Int64? Int64?
143
+
─────┼────────────────────
144
+
1 │ 4 2
145
+
2 │ missing 4
146
+
```
147
+
114
148
We can use the combination of `modify!/modify` and `byrow` to filter observations based on all values in a column, e.g. in the following example we filter all rows which `:x2` and `:x3` are larger than their means:
115
149
116
150
```jldoctest
@@ -179,7 +213,7 @@ julia> filter(ds, :, type = isequal)
179
213
180
214
however, unlike `map`, the function doesn't return the whole modified dataset, it returns a boolean data set with the same number of rows as `ds` and the same number of columns as the length of `cols`, while `fun` has been called on each observation. The return value of `fun` must be `true`, `false`, or `missing`. The combination of `mask` and `byrow` can be used to filter observations.
181
215
182
-
Compared to `byrow`, the `mask` function has some useful features which are handy in some scenarios:
216
+
Compared to `filter/!` (`delete/!`), the `mask` function has the following default behaviour:
183
217
184
218
*`mask` returns a boolean data set which shows exactly which observation will be selected when `fun` is called on it.
185
219
* By default, the `mask` function filters observations based on their formatted values. And to change this we should pass `mapformats = false`.
0 commit comments