Skip to content

Commit 209c72e

Browse files
committed
fixing #32
1 parent 5f1b781 commit 209c72e

File tree

13 files changed

+258
-197
lines changed

13 files changed

+258
-197
lines changed

docs/src/man/aggregation.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ julia> ds = Dataset(g = [1,2,1,2,1,2], x = 1:6)
3030
5 │ 1 5
3131
6 │ 2 6
3232
33-
julia> combine(groupby(ds, :g), :x=>[sum, mean])
33+
julia> combine(groupby(ds, :g), :x=>[IMD.sum, mean])
3434
2×3 Dataset
3535
Row │ g sum_x mean_x
3636
│ identity identity identity
@@ -39,7 +39,7 @@ julia> combine(groupby(ds, :g), :x=>[sum, mean])
3939
1 │ 1 9 3.0
4040
2 │ 2 12 4.0
4141
42-
julia> combine(gatherby(ds, :g), :x => [maximum, minimum], 2:3 => byrow(-) => :range)
42+
julia> combine(gatherby(ds, :g), :x => [IMD.maximum, IMD.minimum], 2:3 => byrow(-) => :range)
4343
2×4 Dataset
4444
Row │ g maximum_x minimum_x range
4545
│ identity identity identity identity
@@ -65,7 +65,7 @@ julia> ds = Dataset(rand(1:10, 10, 4), :auto)
6565
9 │ 5 10 9 6
6666
10 │ 1 1 3 4
6767
68-
julia> combine(gatherby(ds, 1), r"x" => sum)
68+
julia> combine(gatherby(ds, 1), r"x" => IMD.sum)
6969
6×5 Dataset
7070
Row │ x1 sum_x1 sum_x2 sum_x3 sum_x4
7171
│ identity identity identity identity identity
@@ -91,7 +91,7 @@ julia> ds = Dataset(g = [1,2,1,2,1,2], x = 1:6)
9191
5 │ 1 5
9292
6 │ 2 6
9393
94-
julia> combine(gatherby(ds, :g), :x=>[maximum, minimum], 2:3=>byrow(-)=>:range, dropgroupcols = true)
94+
julia> combine(gatherby(ds, :g), :x=>[IMD.maximum, IMD.minimum], 2:3=>byrow(-)=>:range, dropgroupcols = true)
9595
2×3 Dataset
9696
Row │ maximum_x minimum_x range
9797
│ identity identity identity

docs/src/man/grouping.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ Grouped by: g
194194
4 │ 2 missing 3 2.3 missing 3.0
195195
5 │ 2 2 -2 10.0 missing -100.0
196196
197-
julia> modify(ds, r"int" => x -> x .- maximum(x))
197+
julia> modify(ds, r"int" => x -> x .- IMD.maximum(x))
198198
5×6 Grouped Dataset with 2 groups
199199
Grouped by: g
200200
Row │ g x1_int x2_int x1_float x2_float x3_float
@@ -235,7 +235,7 @@ julia> setformat!(sale, :date=>year)
235235
5 │ 2014 132
236236
6 │ 2013 150
237237
238-
julia> spct(x) = x ./ sum(x) .* 100
238+
julia> spct(x) = x ./ IMD.sum(x) .* 100
239239
spct (generic function with 1 method)
240240
241241
julia> modify(groupby(sale, :date), :sale => spct => :sale_pct)

docs/src/man/missing.md

Lines changed: 37 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,39 +2,37 @@
22

33
## Comparing data sets
44

5-
`==` of two data sets or two columns falls back to `isequal`.
5+
`==` of two data sets or two `DatasetColumn`s falls back to `isequal`.
66

77
## Every column supports `missing`
88

99
The `Dataset()` constructor automatically converts each column of a data set to allow ‍‍‍‍‍`missing` when constructs a data set. All algorithms in InMemoryDatasets are optimised to minimised the overhead of supporting `missing` type.
1010

1111
## Functions which skip missing values
1212

13-
When InMemoryDatasets loaded into a Julia session, the behaviour of the following functions will be changed in such a way that they will remove missing values if an `AbstractVector{Union{T, Missing}}` is passed as their argument. And it is the user responsibility to handle the situations where this is not desired.
14-
15-
The following list summarises the details of how InMemoryDatasets removes/skips/ignores missing values (for the rest of this section `INTEGERS` refers to `{U/Int8, U/Int16, U/Int32, U/Int64}` and `FLOATS` refers to `{Float16, Float32, Float64}`):
16-
17-
* `argmax` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
18-
* `argmin` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
19-
* `cummax` : For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
20-
* `cummax!`: For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
21-
* `cummin` : For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
22-
* `cummin!`: For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
23-
* `cumprod` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
24-
* `cumprod!`: For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
25-
* `cumsum` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
26-
* `cumsum!` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
27-
* `extrema` : For `INTEGERS`, `FLOATS`, and `TimeType` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
28-
* `findmax` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
29-
* `findmin` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
30-
* `maximum` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
31-
* `mean` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
32-
* `median` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
33-
* `median!` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
34-
* `minimum` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
35-
* `std` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
36-
* `sum` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
37-
* `var` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
13+
InMemoryDatasets has a set of functions which removes missing values. The following list summarises the details of how InMemoryDatasets removes/skips/ignores missing values (for the rest of this section `INTEGERS` refers to `{U/Int8, U/Int16, U/Int32, U/Int64}` and `FLOATS` refers to `{Float16, Float32, Float64}`):
14+
15+
* `IMD.argmax` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
16+
* `IMD.argmin` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
17+
* `IMD.cummax` : For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
18+
* `IMD.cummax!`: For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
19+
* `IMD.cummin` : For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
20+
* `IMD.cummin!`: For `INTEGERS`, `FLOATS`, and `TimeType` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
21+
* `IMD.cumprod` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
22+
* `IMD.cumprod!`: For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
23+
* `IMD.cumsum` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
24+
* `IMD.cumsum!` : For `INTEGERS` and `FLOATS` ignore missing values, however, by passing `missings = :skip` it jumps over missing values. When all values are `missing`, it returns the input.
25+
* `IMD.extrema` : For `INTEGERS`, `FLOATS`, and `TimeType` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
26+
* `IMD.findmax` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
27+
* `IMD.findmin` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `(missing, missing)`.
28+
* `IMD.maximum` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
29+
* `IMD.mean` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
30+
* `IMD.median` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
31+
* `IMD.median!` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
32+
* `IMD.minimum` : For `INTEGERS`, `FLOATS`, `TimeType`, and `AbstractString` skip missing values. When all values are `missing`, it returns `missing`.
33+
* `IMD.std` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
34+
* `IMD.sum` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
35+
* `IMD.var` : For `INTEGERS` and `FLOATS` skip missing values. When all values are `missing`, it returns `missing`
3836

3937
```jldoctest
4038
julia> x = [1,1,missing]
@@ -43,43 +41,43 @@ julia> x = [1,1,missing]
4341
1
4442
missing
4543
46-
julia> sum(x)
44+
julia> IMD.sum(x)
4745
2
4846
49-
julia> mean(x)
47+
julia> IMD.mean(x)
5048
1.0
5149
52-
julia> maximum(x)
50+
julia> IMD.maximum(x)
5351
1
5452
55-
julia> minimum(x)
53+
julia> IMD.minimum(x)
5654
1
5755
58-
julia> findmax(x)
56+
julia> IMD.findmax(x)
5957
(1, 1)
6058
61-
julia> findmin(x)
59+
julia> IMD.findmin(x)
6260
(1, 1)
6361
64-
julia> cumsum(x)
62+
julia> IMD.cumsum(x)
6563
3-element Vector{Union{Missing, Int64}}:
6664
1
6765
2
6866
2
6967
70-
julia> cumsum(x, missings = :skip)
68+
julia> IMD.cumsum(x, missings = :skip)
7169
3-element Vector{Union{Missing, Int64}}:
7270
1
7371
2
7472
missing
7573
76-
julia> cumprod(x, missings = :skip)
74+
julia> IMD.cumprod(x, missings = :skip)
7775
3-element Vector{Union{Missing, Int64}}:
7876
1
7977
1
8078
missing
8179
82-
julia> median(x)
80+
julia> IMD.median(x)
8381
1.0
8482
```
8583

@@ -88,10 +86,10 @@ julia> median(x)
8886
`var` and `std` will return `missing` when `dof = true` and an `AbstractVector{Union{T, Missing}}` of length one is passed as their argument. This is different from the behaviour of these functions defined in the `Statistics` package.
8987

9088
```jldoctest
91-
julia> var(Union{Missing, Int}[1])
89+
julia> IMD.var(Union{Missing, Int}[1])
9290
missing
9391
94-
julia> std(Union{Missing, Int}[1])
92+
julia> IMD.std(Union{Missing, Int}[1])
9593
missing
9694
9795
julia> var([1]) # fallback to Statistics.var
@@ -103,7 +101,7 @@ NaN
103101

104102
## Multithreaded functions
105103

106-
The `sum`, `minimum`, and `maximum` functions also support the `threads` keyword argument. When it is set to `true`, they exploit all cores for calculation.
104+
The `IMD.sum`, `IMD.minimum`, and `IMD.maximum` functions also support the `threads` keyword argument. When it is set to `true`, they exploit all cores for calculation.
107105

108106
## Other functions
109107

docs/src/man/modify.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ julia> ds = Dataset(x1 = 1:5, x2 = [-2, -1, missing, 1, 2],
5757
4 │ 4 1 missing
5858
5 │ 5 2 0.4
5959
60-
julia> modify!(ds, 2:3 => sum)
60+
julia> modify!(ds, 2:3 => IMD.sum)
6161
5×3 Dataset
6262
Row │ x1 x2 x3
6363
│ identity identity identity
@@ -69,7 +69,7 @@ julia> modify!(ds, 2:3 => sum)
6969
4 │ 4 0 0.7
7070
5 │ 5 0 0.7
7171
72-
julia> modify!(ds, :x1 => x -> x .- mean(x))
72+
julia> modify!(ds, :x1 => x -> x .- IMD.mean(x))
7373
5×3 Dataset
7474
Row │ x1 x2 x3
7575
│ identity identity identity

docs/src/man/transpose.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -494,7 +494,7 @@ julia> ds = Dataset(A = ["foo", "foo", "foo", "foo", "foo",
494494
9 │ bar two large 7 9
495495
496496
julia> ; # This first example aggregates values by taking the sum.
497-
julia> _tmp = combine(groupby(ds, 1:3), 4=>sum);
497+
julia> _tmp = combine(groupby(ds, 1:3), 4=>IMD.sum);
498498
499499
julia> transpose(gatherby(_tmp, 1:2, isgathered = true), :sum_D, id = :C, variable_name = nothing)
500500
4×4 Dataset
@@ -530,7 +530,7 @@ julia> combine(groupby(ds, [:A, :C]), [:D, :E] => mean)
530530
3 │ foo large 2.0 4.5
531531
4 │ foo small 2.33333 4.33333
532532
533-
julia> combine(groupby(ds, [:A, :C]), :D => mean, :E => [minimum, maximum, mean])
533+
julia> combine(groupby(ds, [:A, :C]), :D => mean, :E => [IMD.minimum, IMD.maximum, mean])
534534
4×6 Dataset
535535
Row │ A C mean_D minimum_E maximum_E mean_E
536536
│ identity identity identity identity identity identity

docs/src/man/tutorial.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -245,13 +245,13 @@ julia> combine(groupby(flights, :Dest), :ArrDelay => mean)
245245
100 rows omitted
246246
```
247247

248-
we can summarise several columns at the same time, e.g. for each carrier, calculate the minimum and maximum arrival and departure delays:(Note that in the following code, `r"Delay" => [minimum, maximum]` is normalised as `names(flights, r"Delay") .=> Ref([minimum, maximum])`)
248+
we can summarise several columns at the same time, e.g. for each carrier, calculate the minimum and maximum arrival and departure delays:(Note that in the following code, `r"Delay" => [IMD.minimum, IMD.maximum]` is normalised as `names(flights, r"Delay") .=> Ref([IMD.minimum, IMD.maximum])`)
249249

250250

251251
```julia
252252
julia> @chain flights begin
253253
groupby(:IATA)
254-
combine(r"Delay" => [minimum, maximum])
254+
combine(r"Delay" => [IMD.minimum, IMD.maximum])
255255
end
256256
14×5 Dataset
257257
Row │ IATA minimum_DepDelay maximum_DepDelay minimum_ArrDelay maximum_ArrDelay

0 commit comments

Comments
 (0)