Skip to content

Commit d1656ec

Browse files
committed
new performance tips
1 parent 3199dcc commit d1656ec

File tree

6 files changed

+125
-99
lines changed

6 files changed

+125
-99
lines changed

README.md

Lines changed: 42 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -50,117 +50,79 @@ julia> using GraphTensorNetworks, Random, Graphs
5050
julia> graph = (Random.seed!(2); Graphs.smallgraph(:petersen))
5151
{10, 15} undirected simple Int64 graph
5252

53-
julia> problem = IndependentSet(graph; optimizer=TreeSA(sc_target=0, sc_weight=1.0, ntrials=10, βs=0.01:0.1:15.0, niters=20, rw_weight=0.2));
54-
┌ Warning: target space complexity not found, got: 4.0, with time complexity 7.965784284662087, read-right complexity 8.661778097771988.
55-
└ @ OMEinsumContractionOrders ~/.julia/dev/OMEinsumContractionOrders/src/treesa.jl:71
56-
time/space complexity is (7.965784284662086, 4.0)
53+
julia> problem = IndependentSet(graph);
5754
```
5855

59-
Here, the `problem` is a `IndependentSet` instance, it contains the tensor network contraction tree for the target graph.
60-
Here, we choose the `TreeSA` optimizer to optimize the tensor network contraciton tree, it is a local search based algorithm, check [arXiv: 2108.05665](https://arxiv.org/abs/2108.05665). You will see some warnings, do not panic, this is because we set `sc_target` (target space complex) to 1 for agressive optimization of space complexity. Type `?TreeSA` in a Julia REPL for more information about the key word arguments.
61-
Similarly, one can select tensor network structures for solving other problems like `MaximalIS`, `MaxCut`, `Matching`, `Coloring{K}`, `PaintShop` and `set_packing`.
56+
Here, the `problem` is a `IndependentSet` instance, it contains the tensor network contraction tree for the target graph (the `code` field).
6257

6358
#### 1. find MIS size, count MISs and count ISs
59+
* maximum independent set size
6460
```julia
65-
# maximum independent set size
66-
julia> solve(problem, "size max")
67-
0-dimensional Array{TropicalNumbers.TropicalF64, 0}:
61+
julia> solve(problem, SizeMax())[]
6862
4.0
63+
```
64+
Here, the `solve` function returns you a 0-dimensional array.
65+
For open graphs, this output tensor can have nonzero dimensionality. Each entry corresponds to a different boundary condition.
6966

70-
# all independent sets
71-
julia> solve(problem, "counting sum")
72-
0-dimensional Array{Float64, 0}:
67+
* all independent sets
68+
```julia
69+
julia> solve(problem, CountingAll())[]
7370
76.0
74-
75-
# counting maximum independent sets
76-
julia> solve(problem, "counting max")
77-
0-dimensional Array{TropicalNumbers.CountingTropicalF64, 0}:
78-
(4.0, 5.0)ₜ
79-
80-
# counting independent sets of max two sizes
81-
julia> solve(problem, "counting max2")
82-
0-dimensional Array{Max2Poly{Float64, Float64}, 0}:
83-
30.0*x^3 + 5.0*x^4
8471
```
8572

86-
Here, `solve` function returns you a 0-dimensional array.
87-
For open graphs, this output tensor can have higher dimensions. Each entry corresponds to a different boundary condition.
88-
You can speed up the Tropical number computation when computing "size max" on CPU by using the `TropicalGEMM`
73+
* counting maximum independent sets
74+
```julia
75+
julia> solve(problem, CountingMax())[]
76+
(4.0, 5.0)ₜ # first field is MIS size, second is its counting.
77+
```
8978

79+
* counting independent sets of max two sizes with truncated polynomial
9080
```julia
91-
julia> using TropicalGEMM
81+
julia> solve(problem, CountingMax(2))[]
82+
0-dimensional Array{Max2Poly{Float64, Float64}, 0}:
83+
30.0*x^3 + 5.0*x^4
9284
```
9385

9486
#### 2. compute the independence polynomial
9587

96-
```julia
97-
# using `Polynomial` type
98-
julia> solve(problem, "counting all")
99-
0-dimensional Array{Polynomial{Float64, :x}, 0}:
100-
Polynomial(1.0 + 10.0*x + 30.0*x^2 + 30.0*x^3 + 5.0*x^4)
101-
102-
# using the finitefield approach
103-
julia> solve(problem, "counting all (finitefield)")
104-
0-dimensional Array{Polynomial{BigInt, :x}, 0}:
105-
Polynomial(1 + 10*x + 30*x^2 + 30*x^3 + 5*x^4)
106-
107-
# using the fourier approach
108-
julia> solve(problem, "counting all (fft)", r=1.0)
109-
0-dimensional Array{Polynomial{ComplexF64, :x}, 0}:
110-
Polynomial(1.0000000000000029 + 2.664535259100376e-16im + (10.000000000000004 - 1.9512435398857492e-16im)x + (30.0 - 1.9622216671393801e-16im)x^2 + (30.0 + 1.1553104311877194e-15im)x^3 + (5.0 - 1.030417436395244e-15im)x^4)
111-
```
88+
The following code computes independence polynomial using the finite field algebra (default) approach.
11289

113-
The `finitefield` approach is the most recommended, because it has no round off error is can be upload to GPU. To upload the computing to GPU,
11490
```julia
115-
julia> using CUDA
116-
[ Info: OMEinsum loaded the CUDA module successfully
117-
118-
julia> solve(problem, "counting all (finitefield)", usecuda=true)
119-
0-dimensional Array{Polynomial{BigInt, :x}, 0}:
91+
julia> solve(problem, GraphPolynomial())[]
12092
Polynomial(1 + 10*x + 30*x^2 + 30*x^3 + 5*x^4)
12193
```
12294

123-
The `fft` approach is fast but with round off errors. Its imaginary part can be regarded as the precision,
124-
keyword argument `r` controls the round off errors in high/low IS size region.
95+
The program use `finitefield` method as the default approach, because it has no round off error is can be upload to GPU.
12596

12697
#### 3. find/enumerate solutions
98+
* find one of the best solutions,
12799
```julia
128-
# one of MISs
129-
julia> solve(problem, "config max")
130-
0-dimensional Array{CountingTropical{Float64, ConfigSampler{10, 1, 1}}, 0}:
131-
(4.0, ConfigSampler{10, 1, 1}(1010000011))ₜ
132-
133-
julia> solve(problem, "config max (bounded)")
134-
0-dimensional Array{CountingTropical{Float64, ConfigSampler{10, 1, 1}}, 0}:
100+
julia> solve(problem, SingleConfigMax())[]
135101
(4.0, ConfigSampler{10, 1, 1}(1010000011))ₜ
102+
```
136103

137-
# enumerate all MISs
138-
julia> solve(problem, "configs max") # not recommended
139-
0-dimensional Array{CountingTropical{Float64, ConfigEnumerator{10, 1, 1}}, 0}:
140-
(4.0, {1010000011, 0100100110, 1001001100, 0010111000, 0101010001})ₜ
141-
142-
julia> solve(problem, "configs max (bounded)")
104+
* enumerate all MISs
105+
```julia
106+
julia> cs = solve(problem, ConfigsMax())[]
143107
0-dimensional Array{CountingTropical{Int64, ConfigEnumerator{10, 1, 1}}, 0}:
144108
(4, {1010000011, 0100100110, 1001001100, 0010111000, 0101010001})ₜ
145-
146-
# enumerate all MIS and MIS-1 configurations
147-
julia> solve(problem, "configs max2")
148-
0-dimensional Array{Max2Poly{ConfigEnumerator{10, 1, 1}, Float64}, 0}:
149-
{0010101000, 0101000001, 0100100010, 0010100010, 0100000011, 0010000011, 1001001000, 1010001000, 1001000001, 1010000001, 1010000010, 1000000011, 0100100100, 0000101100, 0101000100, 0001001100, 0000100110, 0100000110, 1001000100, 1000001100, 1000000110, 0100110000, 0000111000, 0101010000, 0001011000, 0010110000, 0010011000, 0001010001, 0100010001, 0010010001}*x^3 + {1010000011, 0100100110, 1001001100, 0010111000, 0101010001}*x^4
150-
151-
# enumerate all IS configurations
152-
julia> solve(problem, "configs all")
153-
0-dimensional Array{Polynomial{ConfigEnumerator{10, 1, 1}, :x}, 0}:
154-
Polynomial({0000000000} + {0010000000, 0000100000, 0001000000, 0100000000, 0000001000, 0000000001, 0000000010, 1000000000, 0000000100, 0000010000}*x + {1000000010, 0010100000, 0010001000, 0100100000, 0000101000, 0101000000, 0001001000, 0001000001, 0100000001, 0010000001, 0000100010, 0100000010, 0010000010, 0000000011, 1001000000, 1000001000, 1010000000, 1000000001, 0000000110, 0000100100, 0001000100, 0100000100, 0000001100, 1000000100, 0010010000, 0000110000, 0001010000, 0100010000, 0000011000, 0000010001}*x^2 + {1010000010, 1000000011, 0010101000, 0101000001, 0100100010, 0010100010, 0100000011, 0010000011, 1001001000, 1010001000, 1001000001, 1010000001, 0000100110, 0100000110, 0100100100, 0000101100, 0101000100, 0001001100, 1001000100, 1000001100, 1000000110, 0010110000, 0010011000, 0100110000, 0000111000, 0101010000, 0001011000, 0001010001, 0100010001, 0010010001}*x^3 + {1010000011, 0100100110, 1001001100, 0010111000, 0101010001}*x^4)
155109
```
156-
157-
If you want to enumerate all MISs, we highly recommend using the bounded version to save the computational effort. One can also store the configurations on your disk by typing
110+
It will use the bounded version to save the computational effort. If you want to save/load your configurations, you can type
158111
```julia
159-
julia> cs = solve(problem, "configs max (bounded)")[1].c # the `c` field is a `ConfigEnumerator`
160-
{1010000011, 0100100110, 1001001100, 0010111000, 0101010001}
161-
162-
julia> save_configs("configs.dat", cs; format=:text) # `:text` or `:binary`
112+
julia> save_configs("configs.dat", cs.c; format=:text) # `:text` or `:binary`
163113

164114
julia> load_configs("configs.dat"; format=:text)
165115
{1010000011, 0100100110, 1001001100, 0010111000, 0101010001}
166116
```
117+
118+
* enumerate all configurations of size α(G) and α(G)-1
119+
```julia
120+
julia> solve(problem, ConfigsMax(2))[]
121+
{0010101000, 0101000001, 0100100010, 0010100010, 0100000011, 0010000011, 1001001000, 1010001000, 1001000001, 1010000001, 1010000010, 1000000011, 0100100100, 0000101100, 0101000100, 0001001100, 0000100110, 0100000110, 1001000100, 1000001100, 1000000110, 0100110000, 0000111000, 0101010000, 0001011000, 0010110000, 0010011000, 0001010001, 0100010001, 0010010001}*x^3 + {1010000011, 0100100110, 1001001100, 0010111000, 0101010001}*x^4
122+
```
123+
124+
* enumerate all independent sets
125+
```julia
126+
julia> solve(problem, ConfigsAll())[]
127+
{0000000000, 0000010000, 1000000000, 0001000000, 0001010000, 1001000000, 0010000000, 0010010000, 1010000000, 0000001000, 0000011000, 1000001000, 0001001000, 0001011000, 1001001000, 0010001000, 0010011000, 1010001000, 0000000010, 1000000010, 0010000010, 1010000010, 0100000000, 0100010000, 0101000000, 0101010000, 0100000010, 0000000100, 1000000100, 0001000100, 1001000100, 0000001100, 1000001100, 0001001100, 1001001100, 0000000110, 1000000110, 0100000100, 0101000100, 0100000110, 0000100000, 0000110000, 0010100000, 0010110000, 0000101000, 0000111000, 0010101000, 0010111000, 0000100010, 0010100010, 0100100000, 0100110000, 0100100010, 0000100100, 0000101100, 0000100110, 0100100100, 0100100110, 0000000001, 0000010001, 1000000001, 0001000001, 0001010001, 1001000001, 0010000001, 0010010001, 1010000001, 0000000011, 1000000011, 0010000011, 1010000011, 0100000001, 0100010001, 0101000001, 0101010001, 0100000011}
128+
```

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ makedocs(;
3838
"Coloring problem" => "tutorials/Coloring.md",
3939
"Other problems" => "tutorials/Others.md",
4040
],
41+
"Performance Tips" => "performancetips.md",
4142
"References" => "ref.md",
4243
],
4344
doctest=false,

docs/src/performancetips.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Performance Tips
2+
3+
## Optimize tensor network contraction order
4+
```julia
5+
julia> using GraphTensorNetworks, Graphs, Random
6+
7+
julia> graph = random_regular_graph(120, 3)
8+
{120, 180} undirected simple Int64 graph
9+
10+
julia> problem = IndependentSet(graph; optimizer=TreeSA(sc_target=20, sc_weight=1.0, rw_weight=3.0, ntrials=10, βs=0.01:0.1:15.0, niters=20), simplifier=MergeGreedy());
11+
```
12+
13+
Key word argument `optimizer` decides the contraction order optimizer of the tensor network.
14+
Here, we choose the `TreeSA` optimizer to optimize the tensor network contraciton tree, it is a local search based algorithm.
15+
It is one of the state of the art tensor network contraction order optimizers, one may check [arXiv: 2108.05665](https://arxiv.org/abs/2108.05665) to learn more about the algorithm.
16+
Other optimizers include
17+
* [`GreedyMethod`](@ref) (default, fastest in searching speed but worse in contraction order)
18+
* [`TreeSA`](@ref)
19+
* [`KaHyParBipartite`](@ref)
20+
* [`SABipartite`](@ref)
21+
22+
One can type `?TreeSA` in a Julia REPL for more information about how to configure the hyper-parameters of `TreeSA` method.
23+
`simplifier` keyword argument is not so important, it is a preprocessing routine to improve the searching speed of the `optimizer`.
24+
25+
The returned instance `problem` contains a field `code` that specifies the tensor network contraction order. For an independence problem, its contraction time space complexity is ``2^{{\rm tw}(G)}``, where ``{\rm tw(G)}`` is the [tree-width](https://en.wikipedia.org/wiki/Treewidth) of ``G``.
26+
One can check the time, space and read-write complexity with the following function.
27+
28+
```julia
29+
julia> timespacereadwrite_complexity(problem)
30+
```
31+
32+
The return values are `log2` of the the number of iterations, the number elements in the max tensor and the number of read-write operations to tensor elements.
33+
34+
## GEMM for Tropical numbers
35+
You can speed up the Tropical number matrix multiplication when computing `SizeMax()` by using the Tropical GEMM routines implemented in package [`TropicalGEMM.jl`](https://github.com/TensorBFS/TropicalGEMM.jl/).
36+
37+
```julia
38+
julia> using BenchmarkTools
39+
40+
julia> @btime solve(problem, SizeMax())
41+
91.630 ms (19203 allocations: 23.72 MiB)
42+
0-dimensional Array{TropicalF64, 0}:
43+
53.0
44+
45+
julia> using TropicalGEMM
46+
47+
julia> @btime solve(problem, SizeMax())
48+
8.960 ms (18532 allocations: 17.01 MiB)
49+
0-dimensional Array{TropicalF64, 0}:
50+
53.0
51+
```
52+
53+
The `TropicalGEMM` pirates the `LinearAlgebra.mul!` interface, hence it takes effect upon using.
54+
The GEMM routine can speed up the computation on CPU for one order, with multi-threading, it can be even faster.
55+
Benchmark shows the performance of `TropicalGEMM` is close to the theoretical optimal value.
56+
57+
## Make use of GPUs
58+
To upload the computing to GPU, you just add need to use CUDA, and offer a new key word argument.
59+
```julia
60+
julia> using CUDA
61+
[ Info: OMEinsum loaded the CUDA module successfully
62+
63+
julia> solve(problem, SizeMax(), usecuda=true)
64+
0-dimensional CuArray{TropicalF64, 0, CUDA.Mem.DeviceBuffer}:
65+
53.0
66+
```
67+
68+
CUDA backended properties are
69+
* [`SizeMax`](@ref)
70+
* [`CoutingAll`](@ref)
71+
* [`CountingMax`](@ref)
72+
* [`GraphPolynomial`](@ref)
73+
* [`SingleConfigMax`](@ref)

examples/IndependentSet.jl

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,22 +36,19 @@ show_graph(graph; locs=locations)
3636
# \end{matrix}\right)_{s_is_j}
3737
# ```
3838
# Let us contruct the problem instance with optimized tensor network contraction order as bellow.
39-
problem = IndependentSet(graph; optimizer=TreeSA(sc_weight=1.0, ntrials=10,
40-
βs=0.01:0.1:15.0, niters=20, rw_weight=0.2),
41-
simplifier=MergeGreedy());
39+
problem = IndependentSet(graph; optimizer=TreeSA());
4240

4341
# In the input arguments of [`IndependentSet`](@ref), the `optimizer` is for optimizing the contraction orders.
44-
# Here we use the local search based optimizer in [arXiv:2108.05665](https://arxiv.org/abs/2108.05665).
45-
# If no optimizer is specified, the default fast (in terms of the speed of searching contraction order)
46-
# but worst (in term of contraction complexity) [`GreedyMethod`](@ref) will be used.
47-
# `simplifier` is a preprocessing routine to speed up the `optimizer`.
42+
# Here we use the local search based optimizer `TreeSA`.
4843
# The returned instance `problem` contains a field `code` that specifies the tensor network contraction order.
49-
# Its contraction time space complexity is ``2^{{\rm tw}(G)}``, where ``{\rm tw(G)}`` is the [tree-width](https://en.wikipedia.org/wiki/Treewidth) of ``G``.
44+
# The optimal contraction time and space complexity of an independent set problem is ``2^{{\rm tw}(G)}``,
45+
# where ``{\rm tw(G)}`` is the [tree-width](https://en.wikipedia.org/wiki/Treewidth) of ``G``.
5046
# One can check the time, space and read-write complexity with the following function.
5147

5248
timespacereadwrite_complexity(problem)
5349

5450
# The return values are `log2` of the the number of iterations, the number elements in the max tensor and the number of read-write operations to tensor elements.
51+
# For more information about the performance, please check the [Performance Tips](@ref).
5552

5653

5754
# ## Solving properties

examples/PaintShop.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,4 +101,4 @@ show_graph(graph; locs=locations, texts=string.(sequence), edge_colors=[sequence
101101

102102
# Since we have different choices of initial color, the number of best solution is 4.
103103
# The following function will check the solution and return you the number of color switchs
104-
num_paint_shop_color_switch(sequence, painting2)
104+
num_paint_shop_color_switch(sequence, painting1)

src/configurations.jl

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,23 +10,19 @@ function best_solutions(gp::GraphProblem; all=false, usecuda=false)
1010
if all && usecuda
1111
throw(ArgumentError("ConfigEnumerator can not be computed on GPU!"))
1212
end
13-
syms = symbols(gp)
14-
T = (all ? set_type : sampler_type)(CountingTropical{Int64}, length(syms), nflavor(gp))
15-
vertex_index = Dict([s=>i for (i, s) in enumerate(syms)])
1613
xst = generate_tensors(l->TropicalF64.(get_weights(gp, l)), gp)
1714
ymask = trues(fill(2, length(getiyv(gp.code)))...)
1815
if usecuda
1916
xst = CuArray.(xst)
2017
ymask = CuArray(ymask)
2118
end
2219
if all
23-
xs = generate_tensors(l->_onehotv.(Ref(T), vertex_index[l], flavors(gp), get_weights(gp, l)), gp)
20+
xs = generate_tensors(fx_solutions(gp, CountingTropical{Int64}, all), gp)
2421
return bounding_contract(AllConfigs{1}(), gp.code, xst, ymask, xs)
2522
else
2623
@assert ndims(ymask) == 0
2724
t, res = solution_ad(gp.code, xst, ymask)
28-
N = length(vertex_index)
29-
return fill(CountingTropical(asscalar(t).n, ConfigSampler(StaticBitVector(map(l->res[l], 1:N)))))
25+
return fill(CountingTropical(asscalar(t).n, ConfigSampler(StaticBitVector(map(l->res[l], 1:length(res))))))
3026
end
3127
end
3228

@@ -57,12 +53,9 @@ Finding optimal and suboptimal solutions.
5753
best2_solutions(gp::GraphProblem; all=true, usecuda=false) = solutions(gp, Max2Poly{Float64,Float64}; all=all, usecuda=usecuda)
5854

5955
function bestk_solutions(gp::GraphProblem, k::Int)
60-
syms = symbols(gp)
61-
vertex_index = Dict([s=>i for (i, s) in enumerate(syms)])
6256
xst = generate_tensors(l->TropicalF64.(get_weights(gp, l)), gp)
6357
ymask = trues(fill(2, length(getiyv(gp.code)))...)
64-
T = set_type(TruncatedPoly{k,Float64,Float64}, length(syms), nflavor(gp))
65-
xs = generate_tensors(l->_onehotv.(Ref(T), vertex_index[l], flavors(gp), get_weights(gp, l)), gp)
58+
xs = generate_tensors(fx_solutions(gp, TruncatedPoly{k,Float64,Float64}, true), gp)
6659
return bounding_contract(AllConfigs{k}(), gp.code, xst, ymask, xs)
6760
end
6861

0 commit comments

Comments
 (0)