You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/man/joins.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ The main functions for combining two data sets are `leftjoin`, `innerjoin`, `out
18
18
19
19
See [the Wikipedia page on SQL joins](https://en.wikipedia.org/wiki/Join_(SQL)) for more information.
20
20
21
-
By default, to match observations, InMemoryDatasets sorts the right data set and uses a binary search algorithm for finding the matches of each observation in the left data set in the right data set based on the passed key column(s), thus, it has better performance when the left data set is larger than the right data set. However, passing `method = :hash` changes the default. The matching is done based on the formatted values of the key column(s), however, using the `mapformats` keyword argument, one may set it to `false` for one or both data sets.
21
+
By default (except for `semijoin` and `antijoin`), to match observations, InMemoryDatasets sorts the right data set and uses a binary search algorithm for finding the matches of each observation in the left data set in the right data set based on the passed key column(s), thus, it has better performance when the left data set is larger than the right data set. However, passing `method = :hash` changes the default. The matching is done based on the formatted values of the key column(s), however, using the `mapformats` keyword argument, one may set it to `false` for one or both data sets.
22
22
23
23
For `leftjoin` and `innerjoin` the order of observations of the output data set is the same as their order in the left data set. However, the order of observations from the right table depends on the stability of the sort algorithm. User can set the `stable` keyword argument to `true` to guarantee a stable sort. For `outerjoin` the order of observations from the left data set in the output data set is also the same as their order in the original data set, however, for those observations which are from the right table, there is no specific order.
Opposite to `semijoin`, perform an anti join of two `Datasets`: `dsl` and `dsr`, and return a `Dataset`
542
542
containing rows where keys appear in `dsl` but not in `dsr`.
@@ -556,7 +556,7 @@ rows that have key values appear in `dsr` will be removed.
556
556
you can use the function `getformat` to see the format;
557
557
by setting `mapformats` to a `Bool Vector` of length 2, you can specify whether to use formatted values
558
558
for `dsl` and `dsr`, respectively; for example, passing a `[true, false]` means use formatted values for `dsl` and do not use formatted values for `dsr`.
559
-
- `method` is either `:sort` or `:hash` for specifiying the method of match finding, default is `:sort`
559
+
- `method` is either `:sort` or `:hash` for specifiying the method of match finding, default is `:hash`
560
560
- `alg`: sorting algorithms used, is `HeapSort` (the Heap Sort algorithm) by default;
561
561
it can also be `QuickSort` (the Quicksort algorithm).
562
562
- `stable`: by default is `false`, means that the sorting results have not to be stable;
@@ -639,7 +639,7 @@ julia> antijoin(dsl, dsr, on = :year, mapformats = true) # Use formats for datas
639
639
1 │ 2012 true
640
640
```
641
641
"""
642
-
function DataAPI.antijoin(dsl::AbstractDataset, dsr::AbstractDataset; on =nothing, mapformats::Union{Bool, Vector{Bool}}=true, stable =false, alg = HeapSort, accelerate =false, view =false, method =:sort, threads =true)
642
+
function DataAPI.antijoin(dsl::AbstractDataset, dsr::AbstractDataset; on =nothing, mapformats::Union{Bool, Vector{Bool}}=true, stable =false, alg = HeapSort, accelerate =false, view =false, method =:hash, threads =true)
643
643
!(method in (:hash, :sort)) &&throw(ArgumentError("method must be :hash or :sort"))
Perform a semi join of two `Datasets`: `dsl` and `dsr`, and return a `Dataset`
654
654
containing rows where keys appear in `dsl` and `dsr`.
@@ -668,7 +668,7 @@ rows that have values in `dsl` while do not have matching values `on` keys in `d
668
668
you can use the function `getformat` to see the format;
669
669
by setting `mapformats` to a `Bool Vector` of length 2, you can specify whether to use formatted values
670
670
for `dsl` and `dsr`, respectively; for example, passing a `[true, false]` means use formatted values for `dsl` and do not use formatted values for `dsr`.
671
-
- `method` is either `:sort` or `:hash` for specifiying the method of match finding, default is `:sort`
671
+
- `method` is either `:sort` or `:hash` for specifiying the method of match finding, default is `:hash`
672
672
- `alg`: sorting algorithms used, is `HeapSort` (the Heap Sort algorithm) by default;
673
673
it can also be `QuickSort` (the Quicksort algorithm).
674
674
- `stable`: by default is `false`, means that the sorting results have not to be stable;
@@ -753,7 +753,7 @@ julia> semijoin(dsl, dsr, on = :year, mapformats = true) # Use formats for datas
753
753
3 │ 2020 true
754
754
```
755
755
"""
756
-
function DataAPI.semijoin(dsl::AbstractDataset, dsr::AbstractDataset; on =nothing, mapformats::Union{Bool, Vector{Bool}}=true, stable =false, alg = HeapSort, accelerate =false, view =false, method =:sort, threads =true)
756
+
function DataAPI.semijoin(dsl::AbstractDataset, dsr::AbstractDataset; on =nothing, mapformats::Union{Bool, Vector{Bool}}=true, stable =false, alg = HeapSort, accelerate =false, view =false, method =:hash, threads =true)
757
757
!(method in (:hash, :sort)) &&throw(ArgumentError("method must be :hash or :sort"))
Opposite to `semijoin`, perform an anti join of two `Datasets`: `dsl` and `dsr`, and change the left table `dsl` into a `Dataset`
768
768
containing rows where keys appear in `dsl` but not in `dsr`.
@@ -782,7 +782,7 @@ rows that have key values appear in `dsr` will be removed.
782
782
you can use the function `getformat` to see the format;
783
783
by setting `mapformats` to a `Bool Vector` of length 2, you can specify whether to use formatted values
784
784
for `dsl` and `dsr`, respectively; for example, passing a `[true, false]` means use formatted values for `dsl` and do not use formatted values for `dsr`.
785
-
- `method` is either `:sort` or `:hash` for specifiying the method of match finding, default is `:sort`
785
+
- `method` is either `:sort` or `:hash` for specifiying the method of match finding, default is `:hash`
786
786
- `alg`: sorting algorithms used, is `HeapSort` (the Heap Sort algorithm) by default;
787
787
it can also be `QuickSort` (the Quicksort algorithm).
788
788
- `stable`: by default is `false`, means that the sorting results have not to be stable;
Perform a semi join of two `Datasets`: `dsl` and `dsr`, and change the left table `dsl` into a `Dataset`
891
891
containing rows where keys appear in `dsl` and `dsr`.
@@ -905,7 +905,7 @@ rows that have values in `dsl` while do not have matching values `on` keys in `d
905
905
you can use the function `getformat` to see the format;
906
906
by setting `mapformats` to a `Bool Vector` of length 2, you can specify whether to use formatted values
907
907
for `dsl` and `dsr`, respectively; for example, passing a `[true, false]` means use formatted values for `dsl` and do not use formatted values for `dsr`.
908
-
- `method` is either `:sort` or `:hash` for specifiying the method of match finding, default is `:sort`
908
+
- `method` is either `:sort` or `:hash` for specifiying the method of match finding, default is `:hash`
909
909
- `alg`: sorting algorithms used, is `HeapSort` (the Heap Sort algorithm) by default;
910
910
it can also be `QuickSort` (the Quicksort algorithm).
911
911
- `stable`: by default is `false`, means that the sorting results have not to be stable;
0 commit comments