|
| 1 | +# Tensor networks |
| 2 | + |
| 3 | +We now introduce the core ideas of tensor networks, highlighting their |
| 4 | +connections with the probabilistic graphical models (PGM) domain to align the terminology between them. |
| 5 | + |
| 6 | +For our purposes, a **tensor** is equivalent with the concept of a factor |
| 7 | +presented above, which we detail more formally below. |
| 8 | + |
| 9 | +## What is a tensor? |
| 10 | +*Definition*: A tensor $T$ is defined as: |
| 11 | +```math |
| 12 | +T: \prod_{V \in \bm{V}} \mathcal{D}_{V} \rightarrow \texttt{number}. |
| 13 | +``` |
| 14 | +Here, the function $T$ maps each possible instantiation of the random |
| 15 | +variables in its scope $\bm{V}$ to a generic number type. In the context of tensor networks, |
| 16 | +a minimum requirement is that the number type is a commutative semiring. |
| 17 | +To define a commutative semiring with the addition operation $\oplus$ and the multiplication operation $\odot$ on a set $S$, the following relations must hold for any arbitrary three elements $a, b, c \in S$. |
| 18 | +```math |
| 19 | +\newcommand{\mymathbb}[1]{\mathbb{#1}} |
| 20 | +\begin{align*} |
| 21 | +(a \oplus b) \oplus c = a \oplus (b \oplus c) & \hspace{5em}\text{$\triangleright$ commutative monoid $\oplus$ with identity $\mymathbb{0}$}\\ |
| 22 | +a \oplus \mymathbb{0} = \mymathbb{0} \oplus a = a &\\ |
| 23 | +a \oplus b = b \oplus a &\\ |
| 24 | +&\\ |
| 25 | +(a \odot b) \odot c = a \odot (b \odot c) & \hspace{5em}\text{$\triangleright$ commutative monoid $\odot$ with identity $\mymathbb{1}$}\\ |
| 26 | +a \odot \mymathbb{1} = \mymathbb{1} \odot a = a &\\ |
| 27 | +a \odot b = b \odot a &\\ |
| 28 | +&\\ |
| 29 | +a \odot (b\oplus c) = a\odot b \oplus a\odot c & \hspace{5em}\text{$\triangleright$ left and right distributive}\\ |
| 30 | +(a\oplus b) \odot c = a\odot c \oplus b\odot c &\\ |
| 31 | +&\\ |
| 32 | +a \odot \mymathbb{0} = \mymathbb{0} \odot a = \mymathbb{0} |
| 33 | +\end{align*} |
| 34 | +``` |
| 35 | +Tensors are represented using multidimensional arrays of nonnegative numbers |
| 36 | +with labeled dimensions. These labels correspond to the array's indices, which |
| 37 | +in turn represent the set of random variables that the tensor is a function |
| 38 | +of. Thus, in this context, the terms **label**, **index**, and |
| 39 | +**variable** are synonymous and hence used interchangeably. |
| 40 | + |
| 41 | +## What is a tensor network? |
| 42 | +We now turn our attention to defining a **tensor network**. |
| 43 | +Tensor network a mathematical object that can be used to represent a multilinear map between tensors. It is widely used in condensed matter physics [^Orus2014][^Pfeifer2014] and quantum simulation [^Markov2008][^Pan2022]. It is also a powerful tool for solving combinatorial optimization problems [^Liu2023]. |
| 44 | +It is important to note that we use a generalized version of the conventional |
| 45 | +notation, which is also knwon as the [eisnum](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) function that widely used in high performance computing. |
| 46 | +Packages that implement the conventional notation include |
| 47 | +- [numpy](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) |
| 48 | +- [OMEinsum.jl](https://github.com/under-Peter/OMEinsum.jl) |
| 49 | +- [PyTorch](https://pytorch.org/docs/stable/generated/torch.einsum.html) |
| 50 | +- [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/einsum) |
| 51 | + |
| 52 | +This approach allows us to represent a more extensive set of sum-product multilinear operations between tensors, meeting the requirements of the PGM field. |
| 53 | + |
| 54 | +*Definition*[^Liu2023]: A tensor network is a multilinear map represented by the triple |
| 55 | +$\mathcal{N} = (\Lambda, \mathcal{T}, \bm{\sigma}_0)$, where: |
| 56 | +- $\Lambda$ is the set of variables present in the network |
| 57 | + $\mathcal{N}$. |
| 58 | +- $\mathcal{T} = \{ T^{(k)}_{\bm{\sigma}_k} \}_{k=1}^{M}$ is the set of |
| 59 | + input tensors, where each tensor $T^{(k)}_{\bm{\sigma}_k}$ is identified |
| 60 | + by a superscript $(k)$ and has an associated scope $\bm{\sigma}_k$. |
| 61 | +- $\bm{\sigma}_0$ specifies the scope of the output tensor. |
| 62 | + |
| 63 | +More specifically, each tensor $T^{(k)}_{\bm{\sigma}_k} \in \mathcal{T}$ is |
| 64 | +labeled by a string $\bm{\sigma}_k \in \Lambda^{r \left(T^{(k)} \right)}$, where |
| 65 | +$r \left(T^{(k)} \right)$ is the rank of $T^{(k)}$. The multilinear map, or |
| 66 | +the `contraction`, applied to this triple is defined as |
| 67 | +```math |
| 68 | +\texttt{contract}(\Lambda, \mathcal{T}, \bm{\sigma}_0) = \sum_{\bm{\sigma}_{\Lambda |
| 69 | +\setminus [\bm{\sigma}_0]}} \prod_{k=1}^{M} T^{(k)}_{\bm{\sigma}_k}, |
| 70 | +``` |
| 71 | +Notably, the summation extends over all instantiations of the variables that |
| 72 | +are not part of the output tensor. |
| 73 | + |
| 74 | +As an example, the matrix multiplication can be specified as a tensor network |
| 75 | +contraction |
| 76 | +```math |
| 77 | + (AB)_{ik} = \texttt{contract}\left(\{i,j,k\}, \{A_{ij}, B_{jk}\}, ik\right), |
| 78 | +``` |
| 79 | +where matrices $A$ and $B$ are input tensors labeled by strings $ij, jk \in |
| 80 | +\{i, j, k\}^2$. The output tensor is labeled by string $ik$. The |
| 81 | +summation runs over indices $\Lambda \setminus [ik] = \{j\}$. The contraction |
| 82 | +corresponds to |
| 83 | +```math |
| 84 | + \texttt{contract}\left(\{i,j,k\}, \{A_{ij}, B_{jk}\}, ik\right) = \sum_j |
| 85 | + A_{ij}B_{jk}, |
| 86 | +``` |
| 87 | +In programming languages, this is equivalent to einsum notation `ij, jk -> ik`. |
| 88 | + |
| 89 | +Diagrammatically, a tensor network can be represented as an *open hypergraph*. In the tensor network diagram, a tensor is mapped to a vertex, |
| 90 | +and a variable is mapped to a hyperedge. If and only if tensors share the same variable, we connect |
| 91 | +them with the same hyperedge for that variable. The diagrammatic |
| 92 | +representation of matrix multiplication is as bellow. |
| 93 | +```@eval |
| 94 | +using TikzPictures |
| 95 | +
|
| 96 | +tp = TikzPicture( |
| 97 | + L""" |
| 98 | + \matrix[row sep=0.8cm,column sep=0.8cm,ampersand replacement= \& ] { |
| 99 | + \node (1) {}; \& |
| 100 | + \node (a) [mytensor] {$A$}; \& |
| 101 | + \node (b) [mytensor] {$B$}; \& |
| 102 | + \node (2) {}; \& |
| 103 | + \\ |
| 104 | + }; |
| 105 | + \draw [myedge, color=c01] (1) edge node[below] {$i$} (a); |
| 106 | + \draw [myedge, color=c02] (a) edge node[below] {$j$} (b); |
| 107 | + \draw [myedge, color=c03] (b) edge node[below] {$k$} (2); |
| 108 | +""", options="scale=3.8", |
| 109 | + preamble="\\input{" * joinpath(@__DIR__, "assets", "preambles", "the-tensor-network") * "}", |
| 110 | + ) |
| 111 | +save(SVG("the-tensor-network1"), tp) |
| 112 | +``` |
| 113 | + |
| 114 | +```@raw html |
| 115 | +<img src="the-tensor-network1.svg" style="margin-left: auto; margin-right: auto; display:block; width=50%"> |
| 116 | +``` |
| 117 | + |
| 118 | +Here, we use different colors to denote different hyperedges. Hyperedges for |
| 119 | +$i$ and $j$ are left open to denote variables in the output string |
| 120 | +$\bm{\sigma}_0$. The reason why we should use hyperedges rather than regular edge |
| 121 | +will be made clear by the followng star contraction example. |
| 122 | +```math |
| 123 | + \texttt{contract}(\{i,j,k,l\}, \{A_{il}, B_{jl}, C_{kl}\}, ijk) = \sum_{l}A_{il} |
| 124 | + B_{jl} C_{kl} |
| 125 | +``` |
| 126 | +In programming languages, this is equivalent to einsum notation `il, jl, kl -> ijk`. |
| 127 | + |
| 128 | +Among the variables, $l$ is shared by all three tensors, hence the diagram can |
| 129 | +not be represented as a simple graph. The hypergraph representation is as |
| 130 | +below. |
| 131 | +```@eval |
| 132 | +using TikzPictures |
| 133 | +
|
| 134 | +tp = TikzPicture( |
| 135 | + L""" |
| 136 | + \matrix[row sep=0.4cm,column sep=0.4cm,ampersand replacement= \& ] { |
| 137 | + \& |
| 138 | + \& |
| 139 | + \node[color=c01] (j) {$j$}; \& |
| 140 | + \& |
| 141 | + \& |
| 142 | + \\ |
| 143 | + \& |
| 144 | + \& |
| 145 | + \node (b) [mytensor] {$B$}; \& |
| 146 | + \& |
| 147 | + \& |
| 148 | + \\ |
| 149 | + \node[color=c03] (i) {$i$}; \& |
| 150 | + \node (a) [mytensor] {$A$}; \& |
| 151 | + \node[color=c02] (l) {$l$}; \& |
| 152 | + \node (c) [mytensor] {$C$}; \& |
| 153 | + \node[color=c04] (k) {$k$}; \& |
| 154 | + \\ |
| 155 | + }; |
| 156 | + \draw [myedge, color=c01] (j) edge (b); |
| 157 | + \draw [myedge, color=c02] (b) edge (l); |
| 158 | + \draw [myedge, color=c03] (i) edge (a); |
| 159 | + \draw [myedge, color=c02] (a) edge (l); |
| 160 | + \draw [myedge, color=c02] (l) edge (c); |
| 161 | + \draw [myedge, color=c04] (c) edge (k); |
| 162 | +""", options="", |
| 163 | + preamble="\\input{" * joinpath(@__DIR__, "assets", "preambles", "the-tensor-network") * "}", |
| 164 | + ) |
| 165 | +save(SVG("the-tensor-network2"), tp) |
| 166 | +``` |
| 167 | + |
| 168 | +```@raw html |
| 169 | +<img src="the-tensor-network2.svg" style="margin-left: auto; margin-right: auto; display:block; width=50%"> |
| 170 | +``` |
| 171 | + |
| 172 | +As a final comment, repeated indices in the same tensor is not forbidden in |
| 173 | +the definition of a tensor network, hence self-loops are also allowed in a tensor |
| 174 | +network diagram. |
| 175 | + |
| 176 | +## Tensor network contraction orders |
| 177 | +The performance of a tensor network contraction depends on the order in which |
| 178 | +the tensors are contracted. The order of contraction is usually specified by |
| 179 | +binary trees, where the leaves are the input tensors and the internal nodes |
| 180 | +represent the order of contraction. The root of the tree is the output tensor. |
| 181 | + |
| 182 | +Plenty of algorithms have been proposed to find the optimal contraction order, which includes |
| 183 | +- Greedy algorithms |
| 184 | +- Breadth-first search and Dynamic programming [^Pfeifer2014] |
| 185 | +- Graph bipartitioning [^Gray2021] |
| 186 | +- Local search [^Kalachev2021] |
| 187 | + |
| 188 | +Some of them have already been included in the [OMEinsum](https://github.com/under-Peter/OMEinsum.jl) package. Please check [Performance Tips](@ref) for more details. |
| 189 | + |
| 190 | +## References |
| 191 | + |
| 192 | +[^Orus2014]: |
| 193 | + Orús R. A practical introduction to tensor networks: Matrix product states and projected entangled pair states[J]. Annals of physics, 2014, 349: 117-158. |
| 194 | + |
| 195 | +[^Markov2008]: |
| 196 | + Markov I L, Shi Y. Simulating quantum computation by contracting tensor networks[J]. SIAM Journal on Computing, 2008, 38(3): 963-981. |
| 197 | + |
| 198 | +[^Pfeifer2014]: |
| 199 | + Pfeifer R N C, Haegeman J, Verstraete F. Faster identification of optimal contraction sequences for tensor networks[J]. Physical Review E, 2014, 90(3): 033315. |
| 200 | + |
| 201 | +[^Gray2021]: |
| 202 | + Gray J, Kourtis S. Hyper-optimized tensor network contraction[J]. Quantum, 2021, 5: 410. |
| 203 | + |
| 204 | +[^Kalachev2021]: |
| 205 | + Kalachev G, Panteleev P, Yung M H. Multi-tensor contraction for XEB verification of quantum circuits[J]. arXiv:2108.05665, 2021. |
| 206 | + |
| 207 | +[^Pan2022]: |
| 208 | + Pan F, Chen K, Zhang P. Solving the sampling problem of the sycamore quantum circuits[J]. Physical Review Letters, 2022, 129(9): 090502. |
| 209 | + |
| 210 | +[^Liu2023]: |
| 211 | + Liu J G, Gao X, Cain M, et al. Computing solution space properties of combinatorial optimization problems via generic tensor networks[J]. SIAM Journal on Scientific Computing, 2023, 45(3): A1239-A1270. |
0 commit comments