You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -147,91 +147,6 @@ When we study Dynamic Mode Decomposition below, we shall want to remember this c
147
147
148
148
149
149
150
-
151
-
152
-
153
-
154
-
155
-
## Digression: Polar Decomposition
156
-
157
-
A singular value decomposition (SVD) is related to the **polar decomposition** of $X$
158
-
159
-
$$
160
-
X = SQ
161
-
$$
162
-
163
-
where
164
-
165
-
\begin{align*}
166
-
S & = U\Sigma U^T \cr
167
-
Q & = U V^T
168
-
\end{align*}
169
-
170
-
and $S$ is evidently a symmetric matrix and $Q$ is an orthogonal matrix.
171
-
172
-
## Principle Components Analysis (PCA)
173
-
174
-
Let's begin with a case in which $n >> m$, so that we have many more observations $n$ than random variables $m$.
175
-
176
-
The matrix $X$ is **short and fat** in an $n >> m$ case as opposed to a **tall and skinny** case with $m > > n $ to be discussed later.
177
-
178
-
We regard $X$ as an $m \times n$ matrix of **data**:
179
-
180
-
$$
181
-
X = \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_n\end{bmatrix}
182
-
$$
183
-
184
-
where for $j = 1, \ldots, n$ the column vector $X_j = \begin{bmatrix}X_{1j}\\X_{2j}\\\vdots\\X_{mj}\end{bmatrix}$ is a vector of observations on variables $\begin{bmatrix}x_1\\x_2\\\vdots\\x_m\end{bmatrix}$.
185
-
186
-
In a **time series** setting, we would think of columns $j$ as indexing different __times__ at which random variables are observed, while rows index different random variables.
187
-
188
-
In a **cross section** setting, we would think of columns $j$ as indexing different __individuals__ for which random variables are observed, while rows index different **random variables**.
189
-
190
-
The number of positive singular values equals the rank of matrix $X$.
191
-
192
-
Arrange the singular values in decreasing order.
193
-
194
-
Arrange the positive singular values on the main diagonal of the matrix $\Sigma$ of into a vector $\sigma_R$.
195
-
196
-
Set all other entries of $\Sigma$ to zero.
197
-
198
-
## Relationship of PCA to SVD
199
-
200
-
To relate a SVD to a PCA (principal component analysis) of data set $X$, first construct the SVD of the data matrix $X$:
201
-
202
-
$$
203
-
X = U \Sigma V^T = \sigma_1 U_1 V_1^T + \sigma_2 U_2 V_2^T + \cdots + \sigma_r U_r V_r^T
In equation {eq}`eq:PCA1`, each of the $m \times n$ matrices $U_{j}V_{j}^T$ is evidently
217
-
of rank $1$.
218
-
219
-
Thus, we have
220
-
221
-
$$
222
-
X = \sigma_1 \begin{pmatrix}U_{11}V_{1}^T\\U_{21}V_{1}^T\\\cdots\\U_{m1}V_{1}^T\\\end{pmatrix} + \sigma_2\begin{pmatrix}U_{12}V_{2}^T\\U_{22}V_{2}^T\\\cdots\\U_{m2}V_{2}^T\\\end{pmatrix}+\ldots + \sigma_r\begin{pmatrix}U_{1r}V_{r}^T\\U_{2r}V_{r}^T\\\cdots\\U_{mr}V_{r}^T\\\end{pmatrix}
223
-
$$ (eq:PCA2)
224
-
225
-
Here is how we would interpret the objects in the matrix equation {eq}`eq:PCA2` in
226
-
a time series context:
227
-
228
-
* $ V_{k}^T= \begin{bmatrix}V_{k1} & V_{k2} & \ldots & V_{kn}\end{bmatrix} \quad \textrm{for each} \ k=1, \ldots, n $ is a time series $\lbrace V_{kj} \rbrace_{j=1}^n$ for the $k$th **principal component**
**Remark:** The cells above illustrate application of the `fullmatrices=True` and `full-matrices=False` options.
280
-
Using `full-matrices=False` returns a reduced singular value decomposition. This option implements
281
-
an optimal reduced rank approximation of a matrix, in the sense of minimizing the Frobenius
282
-
norm of the discrepancy between the approximating matrix and the matrix being approximated.
283
-
Optimality in this sense is established in the celebrated Eckart–Young theorem. See <https://en.wikipedia.org/wiki/Low-rank_approximation>.
284
-
285
-
When we study Dynamic Mode Decompositions below, it will be important for us to remember the following important properties of full and reduced SVD's in such tall-skinny cases.
**Remark:** The cells above illustrate application of the `fullmatrices=True` and `full-matrices=False` options.
219
+
Using `full-matrices=False` returns a reduced singular value decomposition. This option implements
220
+
an optimal reduced rank approximation of a matrix, in the sense of minimizing the Frobenius
221
+
norm of the discrepancy between the approximating matrix and the matrix being approximated.
222
+
Optimality in this sense is established in the celebrated Eckart–Young theorem. See <https://en.wikipedia.org/wiki/Low-rank_approximation>.
223
+
224
+
When we study Dynamic Mode Decompositions below, it will be important for us to remember the following important properties of full and reduced SVD's in such tall-skinny cases.
225
+
226
+
227
+
228
+
310
229
311
230
Let's do another exercise, but now we'll set $m = 2 < 5 = n $
A singular value decomposition (SVD) is related to the **polar decomposition** of $X$
251
+
252
+
$$
253
+
X = SQ
254
+
$$
255
+
256
+
where
257
+
258
+
\begin{align*}
259
+
S & = U\Sigma U^T \cr
260
+
Q & = U V^T
261
+
\end{align*}
262
+
263
+
and $S$ is evidently a symmetric matrix and $Q$ is an orthogonal matrix.
264
+
265
+
## Principle Components Analysis (PCA)
266
+
267
+
Let's begin with a case in which $n >> m$, so that we have many more observations $n$ than random variables $m$.
268
+
269
+
The matrix $X$ is **short and fat** in an $n >> m$ case as opposed to a **tall and skinny** case with $m > > n $ to be discussed later.
270
+
271
+
We regard $X$ as an $m \times n$ matrix of **data**:
272
+
273
+
$$
274
+
X = \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_n\end{bmatrix}
275
+
$$
276
+
277
+
where for $j = 1, \ldots, n$ the column vector $X_j = \begin{bmatrix}X_{1j}\\X_{2j}\\\vdots\\X_{mj}\end{bmatrix}$ is a vector of observations on variables $\begin{bmatrix}x_1\\x_2\\\vdots\\x_m\end{bmatrix}$.
278
+
279
+
In a **time series** setting, we would think of columns $j$ as indexing different __times__ at which random variables are observed, while rows index different random variables.
280
+
281
+
In a **cross section** setting, we would think of columns $j$ as indexing different __individuals__ for which random variables are observed, while rows index different **random variables**.
282
+
283
+
The number of positive singular values equals the rank of matrix $X$.
284
+
285
+
Arrange the singular values in decreasing order.
286
+
287
+
Arrange the positive singular values on the main diagonal of the matrix $\Sigma$ of into a vector $\sigma_R$.
288
+
289
+
Set all other entries of $\Sigma$ to zero.
290
+
291
+
## Relationship of PCA to SVD
292
+
293
+
To relate a SVD to a PCA (principal component analysis) of data set $X$, first construct the SVD of the data matrix $X$:
294
+
295
+
$$
296
+
X = U \Sigma V^T = \sigma_1 U_1 V_1^T + \sigma_2 U_2 V_2^T + \cdots + \sigma_r U_r V_r^T
In equation {eq}`eq:PCA1`, each of the $m \times n$ matrices $U_{j}V_{j}^T$ is evidently
310
+
of rank $1$.
311
+
312
+
Thus, we have
313
+
314
+
$$
315
+
X = \sigma_1 \begin{pmatrix}U_{11}V_{1}^T\\U_{21}V_{1}^T\\\cdots\\U_{m1}V_{1}^T\\\end{pmatrix} + \sigma_2\begin{pmatrix}U_{12}V_{2}^T\\U_{22}V_{2}^T\\\cdots\\U_{m2}V_{2}^T\\\end{pmatrix}+\ldots + \sigma_r\begin{pmatrix}U_{1r}V_{r}^T\\U_{2r}V_{r}^T\\\cdots\\U_{mr}V_{r}^T\\\end{pmatrix}
316
+
$$ (eq:PCA2)
317
+
318
+
Here is how we would interpret the objects in the matrix equation {eq}`eq:PCA2` in
319
+
a time series context:
320
+
321
+
* $ V_{k}^T= \begin{bmatrix}V_{k1} & V_{k2} & \ldots & V_{kn}\end{bmatrix} \quad \textrm{for each} \ k=1, \ldots, n $ is a time series $\lbrace V_{kj} \rbrace_{j=1}^n$ for the $k$th **principal component**
is a vector of **loadings** of variables $X_i$ on the $k$th principle component, $i=1, \ldots, m$
325
+
326
+
* $\sigma_k $ for each $k=1, \ldots, r$ is the strength of $k$th **principal component**
329
327
330
328
## PCA with Eigenvalues and Eigenvectors
331
329
@@ -762,6 +760,14 @@ $$ (eq:hatAversion0)
762
760
763
761
This is the case that we are interested in here.
764
762
763
+
If we use formula {eq}`eq:hatAversion0` to calculate $\hat A X$ we find that
764
+
765
+
$$
766
+
\hat A X = X'
767
+
$$
768
+
769
+
so that the regression equation **fits perfectly**, the usual outcome in an **underdetermined least-squares** model.
770
+
765
771
766
772
Thus, we want to fit equation {eq}`eq:VARfirstorder` in a situation in which we have a number $n$ of observations that is small relative to the number $m$ of
767
773
variables that appear in the vector $X_t$.
@@ -781,14 +787,14 @@ $$
781
787
\hat A = X' X^{+}
782
788
$$ (eq:hatAform)
783
789
784
-
where the (possibly huge) $ \tilde n \times m $ matrix $ X^{+} = (X^T X)^{-1} X^T$ is again the pseudo-inverse of $ X $.
790
+
where the (possibly huge) $ \tilde n \times m $ matrix $ X^{+} = (X^T X)^{-1} X^T$ is again a pseudo-inverse of $ X $.
785
791
786
-
For some situations that we are interested in, $X^T X $ can be close to singular, a situation that can lead some numerical algorithms to be error-prone.
792
+
For some situations that we are interested in, $X^T X $ can be close to singular, a situation that can make some numerical algorithms be error-prone.
787
793
788
-
To confront that situationa, we'll use efficient algorithms for computing and for constructing reduced rank approximations of $\hat A$ in formula {eq}`eq:hatAversion0`.
794
+
To confront that possibility, we'll use efficient algorithms for computing and for constructing reduced rank approximations of $\hat A$ in formula {eq}`eq:hatAversion0`.
789
795
790
796
791
-
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of pseudo-regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
797
+
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
792
798
793
799
An efficient way to compute the pseudo-inverse $X^+$ is to start with the (reduced) singular value decomposition
794
800
@@ -800,7 +806,7 @@ $$ (eq:SVDDMD)
800
806
801
807
where $ U $ is $ m \times p $, $ \Sigma $ is a $ p \times p $ diagonal matrix, and $ V^T $ is a $ p \times \tilde n $ matrix.
802
808
803
-
Here $ p $ is the rank of $ X $, where necessarily $ p \leq \tilde n $ because we are in the case in which $m > > \tilde n$.
809
+
Here $ p $ is the rank of $ X $, where necessarily $ p \leq \tilde n $ because we are in a situation in which $m > > \tilde n$.
804
810
805
811
806
812
Since we are in the $m > > \tilde n$ case, we can use the singular value decomposition {eq}`eq:SVDDMD` efficiently to construct the pseudo-inverse $X^+$
@@ -847,8 +853,9 @@ Next, we describe some alternative __reduced order__ representations of our firs
847
853
848
854
## Representation 1
849
855
856
+
In constructing this representation and also whenever we use it, we use a **full** SVD of $X$.
850
857
851
-
We use the $p$ columns of $U$, and thus the $p$ rows of $U^T$, to define a $p \times 1$ vector $\tilde X_t$ as follows
858
+
We use the $p$ columns of $U$, and thus the $p$ rows of $U^T$, to define a $p \times 1$ vector $\tilde b_t$ as follows
852
859
853
860
854
861
$$
@@ -863,7 +870,9 @@ $$ (eq:Xdecoder)
863
870
864
871
(Here we use the notation $b$ to remind ourselves that we are creating a **b**asis vector.)
865
872
866
-
Since $U U^T$ is an $m \times m$ identity matrix, it follows from equation {eq}`eq:tildeXdef2` that we can reconstruct $X_t$ from $\tilde b_t$ by using
873
+
Since we are using a **full** SVD, $U U^T$ is an $m \times m$ identity matrix.
874
+
875
+
So it follows from equation {eq}`eq:tildeXdef2` that we can reconstruct $X_t$ from $\tilde b_t$ by using
867
876
868
877
869
878
@@ -910,8 +919,8 @@ This representation is the one originally proposed by {cite}`schmid2010`.
910
919
It can be regarded as an intermediate step to a related and perhaps more useful representation 3.
911
920
912
921
922
+
As with Representation 1, we continue to
913
923
914
-
To work it requires that we
915
924
916
925
* use all $p$ singular values of $X$
917
926
* use a **full** SVD and **not** a reduced SVD
@@ -937,7 +946,7 @@ where $\Lambda$ is a diagonal matrix of eigenvalues and $W$ is a $p \times p$
937
946
matrix whose columns are eigenvectors corresponding to rows (eigenvalues) in
938
947
$\Lambda$.
939
948
940
-
Note that when $U U^T = I_{p \times p}$, as is true with a full SVD of X (but **not** true with a reduced SVD)
949
+
Note that when $U U^T = I_{m \times m}$, as is true with a full SVD of X (but **not** true with a reduced SVD)
941
950
942
951
$$
943
952
\hat A = U \tilde A U^T = U W \Lambda W^{-1} U^T
@@ -1065,12 +1074,12 @@ We also have the following
1065
1074
{eq}`eq:Atilde0`, define it as the following $r \times r$ counterpart
1066
1075
1067
1076
$$
1068
-
\tilde A = U^T \hat A U
1077
+
\tilde A = \tilde U^T \hat A U
1069
1078
$$ (eq:Atilde10)
1070
1079
1071
-
where in equation {eq}`eq:Atilde10` $U$ is now the $m \times r$ matrix consisting of the eigevectors of $X X^T$ corresponding to the $r$
1080
+
where in equation {eq}`eq:Atilde10` $\tilde U$ is now the $m \times r$ matrix consisting of the eigevectors of $X X^T$ corresponding to the $r$
1072
1081
largest singular values of $X$.
1073
-
The conclusions of the proposition remain true with this altered definition of $U$. (**Beware:** We have **recycled** notation here by temporarily redefining $U$ as being just $r$ columns instead of $p$ columns as we have up to now.)
1082
+
The conclusions of the proposition remain true when we replace $U$by $\tilde U$.
1074
1083
1075
1084
1076
1085
Also see {cite}`DDSE_book` (p. 238)
@@ -1100,7 +1109,7 @@ X_t & = \Phi \check b_t
1100
1109
$$
1101
1110
1102
1111
1103
-
There is a better way to compute the $r \times 1$ vector $\check b_t$
1112
+
But there is a better way to compute the $r \times 1$ vector $\check b_t$
1104
1113
1105
1114
In particular, the following argument from {cite}`DDSE_book` (page 240) provides a computationally efficient way
0 commit comments