|
| 1 | +# Tensor networks |
| 2 | + |
| 3 | +We now introduce the core ideas of tensor networks, highlighting their |
| 4 | +connections with probabilistic graphical models (PGM) to align the terminology |
| 5 | +between them. |
| 6 | + |
| 7 | +For our purposes, a tensor is equivalent to the concept of a factor as defined |
| 8 | +in the PGM domain, which we detail more formally below. |
| 9 | + |
| 10 | +## What is a tensor? |
| 11 | + |
| 12 | +*Definition*: A tensor $T$ is defined as: |
| 13 | +```math |
| 14 | +T: \prod_{V \in \bm{V}} \mathcal{D}_{V} \rightarrow \texttt{number}. |
| 15 | +``` |
| 16 | +Here, the function $T$ maps each possible instantiation of the random |
| 17 | +variables in its scope $\bm{V}$ to a generic number type. In the context of tensor networks, |
| 18 | +a minimum requirement is that the number type is a commutative semiring. |
| 19 | +To define a commutative semiring with the addition operation $\oplus$ and the multiplication operation $\odot$ on a set $S$, the following relations must hold for any arbitrary three elements $a, b, c \in S$. |
| 20 | +```math |
| 21 | +\newcommand{\mymathbb}[1]{\mathbb{#1}} |
| 22 | +\begin{align*} |
| 23 | +(a \oplus b) \oplus c = a \oplus (b \oplus c) & \hspace{5em}\text{$\triangleright$ commutative monoid $\oplus$ with identity $\mymathbb{0}$}\\ |
| 24 | +a \oplus \mymathbb{0} = \mymathbb{0} \oplus a = a &\\ |
| 25 | +a \oplus b = b \oplus a &\\ |
| 26 | +&\\ |
| 27 | +(a \odot b) \odot c = a \odot (b \odot c) & \hspace{5em}\text{$\triangleright$ commutative monoid $\odot$ with identity $\mymathbb{1}$}\\ |
| 28 | +a \odot \mymathbb{1} = \mymathbb{1} \odot a = a &\\ |
| 29 | +a \odot b = b \odot a &\\ |
| 30 | +&\\ |
| 31 | +a \odot (b\oplus c) = a\odot b \oplus a\odot c & \hspace{5em}\text{$\triangleright$ left and right distributive}\\ |
| 32 | +(a\oplus b) \odot c = a\odot c \oplus b\odot c &\\ |
| 33 | +&\\ |
| 34 | +a \odot \mymathbb{0} = \mymathbb{0} \odot a = \mymathbb{0} |
| 35 | +\end{align*} |
| 36 | +``` |
| 37 | +Tensors are represented using multidimensional arrays of nonnegative numbers |
| 38 | +with labeled dimensions. These labels correspond to the array's indices, which |
| 39 | +in turn represent the set of random variables that the tensor is a function |
| 40 | +of. Thus, in this context, the terms **label**, **index**, and |
| 41 | +**variable** are synonymous and hence used interchangeably. |
| 42 | + |
| 43 | +## What is a tensor network? |
| 44 | + |
| 45 | +We now turn our attention to defining a **tensor network**, a mathematical |
| 46 | +object used to represent a multilinear map between tensors. This concept is |
| 47 | +widely employed in fields like condensed matter physics |
| 48 | +[^Orus2014][^Pfeifer2014], quantum simulation [^Markov2008][^Pan2022], and |
| 49 | +even in solving combinatorial optimization problems [^Liu2023]. It's worth |
| 50 | +noting that we use a generalized version of the conventional notation, most |
| 51 | +commonly known through the |
| 52 | +[eisnum](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) |
| 53 | +function, which is commonly used in high-performance computing. Packages that |
| 54 | +implement this conventional notation include |
| 55 | +- [numpy](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) |
| 56 | +- [OMEinsum.jl](https://github.com/under-Peter/OMEinsum.jl) |
| 57 | +- [PyTorch](https://pytorch.org/docs/stable/generated/torch.einsum.html) |
| 58 | +- [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/einsum) |
| 59 | + |
| 60 | +This approach allows us to represent a broader range of sum-product |
| 61 | +multilinear operations between tensors, thus meeting the requirements of the |
| 62 | +PGM field. |
| 63 | + |
| 64 | +*Definition*[^Liu2023]: A tensor network is a multilinear map represented by the triple |
| 65 | +$\mathcal{N} = (\Lambda, \mathcal{T}, \bm{\sigma}_0)$, where: |
| 66 | +- $\Lambda$ is the set of variables present in the network |
| 67 | + $\mathcal{N}$. |
| 68 | +- $\mathcal{T} = \{ T^{(k)}_{\bm{\sigma}_k} \}_{k=1}^{M}$ is the set of |
| 69 | + input tensors, where each tensor $T^{(k)}_{\bm{\sigma}_k}$ is identified |
| 70 | + by a superscript $(k)$ and has an associated scope $\bm{\sigma}_k$. |
| 71 | +- $\bm{\sigma}_0$ specifies the scope of the output tensor. |
| 72 | + |
| 73 | +More specifically, each tensor $T^{(k)}_{\bm{\sigma}_k} \in \mathcal{T}$ is |
| 74 | +labeled by a string $\bm{\sigma}_k \in \Lambda^{r \left(T^{(k)} \right)}$, where |
| 75 | +$r \left(T^{(k)} \right)$ is the rank of $T^{(k)}$. The multilinear map, also |
| 76 | +known as the `contraction`, applied to this triple is defined as |
| 77 | +```math |
| 78 | +\texttt{contract}(\Lambda, \mathcal{T}, \bm{\sigma}_0) = \sum_{\bm{\sigma}_{\Lambda |
| 79 | +\setminus [\bm{\sigma}_0]}} \prod_{k=1}^{M} T^{(k)}_{\bm{\sigma}_k}, |
| 80 | +``` |
| 81 | +Notably, the summation extends over all instantiations of the variables that |
| 82 | +are not part of the output tensor. |
| 83 | + |
| 84 | +As an example, consider matrix multiplication, which can be specified as a |
| 85 | +tensor network contraction: |
| 86 | +```math |
| 87 | + (AB)_{ik} = \texttt{contract}\left(\{i,j,k\}, \{A_{ij}, B_{jk}\}, ik\right), |
| 88 | +``` |
| 89 | +Here, matrices $A$ and $B$ are input tensors labeled by strings $ij, jk \in |
| 90 | +\{i, j, k\}^2$. The output tensor is labeled by string $ik$. Summations run |
| 91 | +over indices $\Lambda \setminus [ik] = \{j\}$. The contraction corresponds to |
| 92 | +```math |
| 93 | + \texttt{contract}\left(\{i,j,k\}, \{A_{ij}, B_{jk}\}, ik\right) = \sum_j |
| 94 | + A_{ij}B_{jk}, |
| 95 | +``` |
| 96 | +In the einsum notation commonly used in various programming languages, this is |
| 97 | +equivalent to `ij, jk -> ik`. |
| 98 | + |
| 99 | +Diagrammatically, a tensor network can be represented as an *open hypergraph*. |
| 100 | +In this diagram, a tensor maps to a vertex, and a variable maps to a |
| 101 | +hyperedge. Tensors sharing the same variable are connected by the same |
| 102 | +hyperedge for that variable. The diagrammatic representation of matrix |
| 103 | +multiplication is: |
| 104 | +```@eval |
| 105 | +using TikzPictures |
| 106 | +
|
| 107 | +tp = TikzPicture( |
| 108 | + L""" |
| 109 | + \matrix[row sep=0.8cm,column sep=0.8cm,ampersand replacement= \& ] { |
| 110 | + \node (1) {}; \& |
| 111 | + \node (a) [mytensor] {$A$}; \& |
| 112 | + \node (b) [mytensor] {$B$}; \& |
| 113 | + \node (2) {}; \& |
| 114 | + \\ |
| 115 | + }; |
| 116 | + \draw [myedge, color=c01] (1) edge node[below] {$i$} (a); |
| 117 | + \draw [myedge, color=c02] (a) edge node[below] {$j$} (b); |
| 118 | + \draw [myedge, color=c03] (b) edge node[below] {$k$} (2); |
| 119 | + """, |
| 120 | + options="every node/.style={scale=2.0}", |
| 121 | + preamble="\\input{" * joinpath(@__DIR__, "assets", "preambles", "the-tensor-network") * "}", |
| 122 | +) |
| 123 | +save(SVG("the-tensor-network1"), tp) |
| 124 | +``` |
| 125 | + |
| 126 | +```@raw html |
| 127 | +<img src="the-tensor-network1.svg" style="margin-left: auto; margin-right: auto; display:block; width=50%"> |
| 128 | +``` |
| 129 | + |
| 130 | +In this diagram, we use different colors to denote different hyperedges. Hyperedges for |
| 131 | +$i$ and $j$ are left open to denote variables in the output string |
| 132 | +$\bm{\sigma}_0$. The reason we use hyperedges rather than regular edges will |
| 133 | +become clear in the following star contraction example. |
| 134 | +```math |
| 135 | + \texttt{contract}(\{i,j,k,l\}, \{A_{il}, B_{jl}, C_{kl}\}, ijk) = \sum_{l}A_{il} |
| 136 | + B_{jl} C_{kl} |
| 137 | +``` |
| 138 | +The equivalent einsum notation employed by many programming languages is `il, |
| 139 | +jl, kl -> ijk`. |
| 140 | + |
| 141 | +Since the variable $l$ is shared across all three tensors, a simple graph |
| 142 | +can't capture the diagram's complexity. The more appropriate hypergraph |
| 143 | +representation is shown below. |
| 144 | +```@eval |
| 145 | +using TikzPictures |
| 146 | +
|
| 147 | +tp = TikzPicture( |
| 148 | + L""" |
| 149 | + \matrix[row sep=0.4cm,column sep=0.4cm,ampersand replacement= \& ] { |
| 150 | + \& |
| 151 | + \& |
| 152 | + \node[color=c01] (j) {$j$}; \& |
| 153 | + \& |
| 154 | + \& |
| 155 | + \\ |
| 156 | + \& |
| 157 | + \& |
| 158 | + \node (b) [mytensor] {$B$}; \& |
| 159 | + \& |
| 160 | + \& |
| 161 | + \\ |
| 162 | + \node[color=c03] (i) {$i$}; \& |
| 163 | + \node (a) [mytensor] {$A$}; \& |
| 164 | + \node[color=c02] (l) {$l$}; \& |
| 165 | + \node (c) [mytensor] {$C$}; \& |
| 166 | + \node[color=c04] (k) {$k$}; \& |
| 167 | + \\ |
| 168 | + }; |
| 169 | + \draw [myedge, color=c01] (j) edge (b); |
| 170 | + \draw [myedge, color=c02] (b) edge (l); |
| 171 | + \draw [myedge, color=c03] (i) edge (a); |
| 172 | + \draw [myedge, color=c02] (a) edge (l); |
| 173 | + \draw [myedge, color=c02] (l) edge (c); |
| 174 | + \draw [myedge, color=c04] (c) edge (k); |
| 175 | + """, |
| 176 | + options="every node/.style={scale=2.0}", |
| 177 | + preamble="\\input{" * joinpath(@__DIR__, "assets", "preambles", "the-tensor-network") * "}", |
| 178 | +) |
| 179 | +save(SVG("the-tensor-network2"), tp) |
| 180 | +``` |
| 181 | + |
| 182 | +```@raw html |
| 183 | +<img src="the-tensor-network2.svg" style="margin-left: auto; margin-right: auto; display:block; width=50%"> |
| 184 | +``` |
| 185 | + |
| 186 | +As a final note, our definition of a tensor network allows for repeated |
| 187 | +indices within the same tensor, which translates to self-loops in their |
| 188 | +corresponding diagrams. |
| 189 | + |
| 190 | +## Tensor network contraction orders |
| 191 | + |
| 192 | +The performance of a tensor network contraction depends on the order in which |
| 193 | +the tensors are contracted. The order of contraction is usually specified by |
| 194 | +binary trees, where the leaves are the input tensors and the internal nodes |
| 195 | +represent the order of contraction. The root of the tree is the output tensor. |
| 196 | + |
| 197 | +Numerous approaches have been proposed to determine efficient contraction |
| 198 | +orderings, which include: |
| 199 | +- Greedy algorithms |
| 200 | +- Breadth-first search and Dynamic programming [^Pfeifer2014] |
| 201 | +- Graph bipartitioning [^Gray2021] |
| 202 | +- Local search [^Kalachev2021] |
| 203 | + |
| 204 | +Some of these have been implemented in the |
| 205 | +[OMEinsum](https://github.com/under-Peter/OMEinsum.jl) package. Please check |
| 206 | +[Performance Tips](@ref) for more details. |
| 207 | + |
| 208 | +## References |
| 209 | + |
| 210 | +[^Orus2014]: |
| 211 | + Orús R. A practical introduction to tensor networks: Matrix product states and projected entangled pair states[J]. Annals of physics, 2014, 349: 117-158. |
| 212 | + |
| 213 | +[^Markov2008]: |
| 214 | + Markov I L, Shi Y. Simulating quantum computation by contracting tensor networks[J]. SIAM Journal on Computing, 2008, 38(3): 963-981. |
| 215 | + |
| 216 | +[^Pfeifer2014]: |
| 217 | + Pfeifer R N C, Haegeman J, Verstraete F. Faster identification of optimal contraction sequences for tensor networks[J]. Physical Review E, 2014, 90(3): 033315. |
| 218 | + |
| 219 | +[^Gray2021]: |
| 220 | + Gray J, Kourtis S. Hyper-optimized tensor network contraction[J]. Quantum, 2021, 5: 410. |
| 221 | + |
| 222 | +[^Kalachev2021]: |
| 223 | + Kalachev G, Panteleev P, Yung M H. Multi-tensor contraction for XEB verification of quantum circuits[J]. arXiv:2108.05665, 2021. |
| 224 | + |
| 225 | +[^Pan2022]: |
| 226 | + Pan F, Chen K, Zhang P. Solving the sampling problem of the sycamore quantum circuits[J]. Physical Review Letters, 2022, 129(9): 090502. |
| 227 | + |
| 228 | +[^Liu2023]: |
| 229 | + Liu J G, Gao X, Cain M, et al. Computing solution space properties of combinatorial optimization problems via generic tensor networks[J]. SIAM Journal on Scientific Computing, 2023, 45(3): A1239-A1270. |
0 commit comments