You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/man/missing.md
+37-39Lines changed: 37 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,39 +2,37 @@
2
2
3
3
## Comparing data sets
4
4
5
-
`==` of two data sets or two columns falls back to `isequal`.
5
+
`==` of two data sets or two `DatasetColumn`s falls back to `isequal`.
6
6
7
7
## Every column supports `missing`
8
8
9
9
The `Dataset()` constructor automatically converts each column of a data set to allow `missing` when constructs a data set. All algorithms in InMemoryDatasets are optimised to minimised the overhead of supporting `missing` type.
10
10
11
11
## Functions which skip missing values
12
12
13
-
When InMemoryDatasets loaded into a Julia session, the behaviour of the following functions will be changed in such a way that they will remove missing values if an `AbstractVector{Union{T, Missing}}` is passed as their argument. And it is the user responsibility to handle the situations where this is not desired.
14
-
15
-
The following list summarises the details of how InMemoryDatasets removes/skips/ignores missing values (for the rest of this section `INTEGERS` refers to `{U/Int8, U/Int16, U/Int32, U/Int64}` and `FLOATS` refers to `{Float16, Float32, Float64}`):
16
-
17
-
*`argmax` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
18
-
*`argmin` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
19
-
*`cummax` : For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
20
-
*`cummax!`: For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
21
-
*`cummin` : For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
22
-
*`cummin!`: For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
23
-
*`cumprod` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
24
-
*`cumprod!`: For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
25
-
*`cumsum` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
26
-
*`cumsum!` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
27
-
*`extrema` : For `INTEGERS`, `FLOATS`, and `TimeType` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
28
-
*`findmax` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
29
-
*`findmin` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
30
-
*`maximum` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
31
-
*`mean` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
32
-
*`median` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
33
-
*`median!` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
34
-
*`minimum` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
35
-
*`std` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
36
-
*`sum` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
37
-
*`var` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
13
+
InMemoryDatasets has a set of functions which removes missing values. The following list summarises the details of how InMemoryDatasets removes/skips/ignores missing values (for the rest of this section `INTEGERS` refers to `{U/Int8, U/Int16, U/Int32, U/Int64}` and `FLOATS` refers to `{Float16, Float32, Float64}`):
14
+
15
+
*`IMD.argmax` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
16
+
*`IMD.argmin` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
17
+
*`IMD.cummax` : For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
18
+
*`IMD.cummax!`: For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
19
+
*`IMD.cummin` : For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
20
+
*`IMD.cummin!`: For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
21
+
*`IMD.cumprod` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
22
+
*`IMD.cumprod!`: For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
23
+
*`IMD.cumsum` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
24
+
*`IMD.cumsum!` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
25
+
*`IMD.extrema` : For `INTEGERS`, `FLOATS`, and `TimeType` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
26
+
*`IMD.findmax` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
27
+
*`IMD.findmin` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
28
+
*`IMD.maximum` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
29
+
*`IMD.mean` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
30
+
*`IMD.median` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
31
+
*`IMD.median!` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
32
+
*`IMD.minimum` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
33
+
*`IMD.std` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
34
+
*`IMD.sum` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
35
+
*`IMD.var` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
38
36
39
37
```jldoctest
40
38
julia> x = [1,1,missing]
@@ -43,43 +41,43 @@ julia> x = [1,1,missing]
43
41
1
44
42
missing
45
43
46
-
julia> sum(x)
44
+
julia> IMD.sum(x)
47
45
2
48
46
49
-
julia> mean(x)
47
+
julia> IMD.mean(x)
50
48
1.0
51
49
52
-
julia> maximum(x)
50
+
julia> IMD.maximum(x)
53
51
1
54
52
55
-
julia> minimum(x)
53
+
julia> IMD.minimum(x)
56
54
1
57
55
58
-
julia> findmax(x)
56
+
julia> IMD.findmax(x)
59
57
(1, 1)
60
58
61
-
julia> findmin(x)
59
+
julia> IMD.findmin(x)
62
60
(1, 1)
63
61
64
-
julia> cumsum(x)
62
+
julia> IMD.cumsum(x)
65
63
3-element Vector{Union{Missing, Int64}}:
66
64
1
67
65
2
68
66
2
69
67
70
-
julia> cumsum(x, missings = :skip)
68
+
julia> IMD.cumsum(x, missings = :skip)
71
69
3-element Vector{Union{Missing, Int64}}:
72
70
1
73
71
2
74
72
missing
75
73
76
-
julia> cumprod(x, missings = :skip)
74
+
julia> IMD.cumprod(x, missings = :skip)
77
75
3-element Vector{Union{Missing, Int64}}:
78
76
1
79
77
1
80
78
missing
81
79
82
-
julia> median(x)
80
+
julia> IMD.median(x)
83
81
1.0
84
82
```
85
83
@@ -88,10 +86,10 @@ julia> median(x)
88
86
`var` and `std` will return `missing` when `dof = true` and an `AbstractVector{Union{T, Missing}}` of length one is passed as their argument. This is different from the behaviour of these functions defined in the `Statistics` package.
89
87
90
88
```jldoctest
91
-
julia> var(Union{Missing, Int}[1])
89
+
julia> IMD.var(Union{Missing, Int}[1])
92
90
missing
93
91
94
-
julia> std(Union{Missing, Int}[1])
92
+
julia> IMD.std(Union{Missing, Int}[1])
95
93
missing
96
94
97
95
julia> var([1]) # fallback to Statistics.var
@@ -103,7 +101,7 @@ NaN
103
101
104
102
## Multithreaded functions
105
103
106
-
The `sum`, `minimum`, and `maximum` functions also support the `threads` keyword argument. When it is set to `true`, they exploit all cores for calculation.
104
+
The `IMD.sum`, `IMD.minimum`, and `IMD.maximum` functions also support the `threads` keyword argument. When it is set to `true`, they exploit all cores for calculation.
we can summarise several columns at the same time, e.g. for each carrier, calculate the minimum and maximum arrival and departure delays:(Note that in the following code, `r"Delay" => [minimum, maximum]` is normalised as `names(flights, r"Delay") .=> Ref([minimum, maximum])`)
248
+
we can summarise several columns at the same time, e.g. for each carrier, calculate the minimum and maximum arrival and departure delays:(Note that in the following code, `r"Delay" => [IMD.minimum, IMD.maximum]` is normalised as `names(flights, r"Delay") .=> Ref([IMD.minimum, IMD.maximum])`)
0 commit comments