Skip to content

Commit a612731

Browse files
Tom's April 18 edits of svd lecture
1 parent 98a7190 commit a612731

File tree

1 file changed

+74
-59
lines changed

1 file changed

+74
-59
lines changed

lectures/svd_intro.md

Lines changed: 74 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -108,13 +108,11 @@ In what is called a **full** SVD, the shapes of $U$, $\Sigma$, and $V$ are $\le
108108

109109
There is also an alternative shape convention called an **economy** or **reduced** SVD .
110110

111-
Thus, note that because we assume that $A$ has rank $r$, there are only $r $ nonzero singular values, where $r=\textrm{rank}(A)\leq\min\left(m, n\right)$.
111+
Thus, note that because we assume that $X$ has rank $r$, there are only $r $ nonzero singular values, where $r=\textrm{rank}(X)\leq\min\left(m, n\right)$.
112112

113113
A **reduced** SVD uses this fact to express $U$, $\Sigma$, and $V$ as matrices with shapes $\left(m, r\right)$, $\left(r, r\right)$, $\left(r, n\right)$.
114114

115-
Sometimes, we will use a full SVD
116-
117-
At other times, we'll use a reduced SVD in which $\Sigma$ is an $r \times r$ diagonal matrix.
115+
Sometimes, we will use a **full** SVD in which $U$, $\Sigma$, and $V$ have shapes $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$
118116

119117

120118
**Caveat:**
@@ -652,6 +650,9 @@ $$
652650
X_{t+1} = A X_t + C \epsilon_{t+1}
653651
$$ (eq:VARfirstorder)
654652
653+
where $\epsilon_{t+1}$ is the time $t+1$ instance of an i.i.d. $m \times 1$ random vector with mean vector
654+
zero and identity covariance matrix and
655+
655656
where
656657
the $ m \times 1 $ vector $ X_t $ is
657658
@@ -666,46 +667,46 @@ and where $ T $ again denotes complex transposition and $ X_{i,t} $ is an observ
666667
We want to fit equation {eq}`eq:VARfirstorder`.
667668
668669
669-
Our data is assembled in the form of an $ m \times n $ matrix $ \tilde X $
670+
Our data are organized in an $ m \times (n+1) $ matrix $ \tilde X $
670671
671672
$$
672-
\tilde X = \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_n\end{bmatrix}
673+
\tilde X = \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_n \mid X_{n+1} \end{bmatrix}
673674
$$
674675
675-
where for $ t = 1, \ldots, n $, the $ m \times 1 $ vector $ X_t $ is given by {eq}`eq:Xvector`.
676+
where for $ t = 1, \ldots, n +1 $, the $ m \times 1 $ vector $ X_t $ is given by {eq}`eq:Xvector`.
676677
677-
We want to estimate system {eq}`eq:VARfirstorder` consisting of $ m $ least squares regressions of **everything** on one lagged value of **everything**.
678+
Thus, we want to estimate a system {eq}`eq:VARfirstorder` that consists of $ m $ least squares regressions of **everything** on one lagged value of **everything**.
678679
679680
The $i$'th equation of {eq}`eq:VARfirstorder` is a regression of $X_{i,t+1}$ on the vector $X_t$.
680681
681682
682683
We proceed as follows.
683684
684685
685-
From $ \tilde X $, we form two matrices
686+
From $ \tilde X $, we form two $m \times n$ matrices
686687
687688
$$
688-
X = \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_{n-1}\end{bmatrix}
689+
X = \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_{n}\end{bmatrix}
689690
$$
690691
691692
and
692693
693694
$$
694-
X' = \begin{bmatrix} X_2 \mid X_3 \mid \cdots \mid X_n\end{bmatrix}
695+
X' = \begin{bmatrix} X_2 \mid X_3 \mid \cdots \mid X_{n+1}\end{bmatrix}
695696
$$
696697
697698
Here $ ' $ does not indicate matrix transposition but instead is part of the name of the matrix $ X' $.
698699
699700
In forming $ X $ and $ X' $, we have in each case dropped a column from $ \tilde X $, the last column in the case of $ X $, and the first column in the case of $ X' $.
700701
701-
Evidently, $ X $ and $ X' $ are both $ m \times \tilde n $ matrices where $ \tilde n = n - 1 $.
702+
Evidently, $ X $ and $ X' $ are both $ m \times n $ matrices.
702703
703-
We denote the rank of $ X $ as $ p \leq \min(m, \tilde n) $.
704+
We denote the rank of $ X $ as $ p \leq \min(m, n) $.
704705
705706
Two possible cases are
706707
707-
* $ \tilde n > > m$, so that we have many more time series observations $\tilde n$ than variables $m$
708-
* $m > > \tilde n$, so that we have many more variables $m $ than time series observations $\tilde n$
708+
* $ n > > m$, so that we have many more time series observations $n$ than variables $m$
709+
* $m > > n$, so that we have many more variables $m $ than time series observations $n$
709710
710711
At a general level that includes both of these special cases, a common formula describes the least squares estimator $\hat A$ of $A$ for both cases, but important details differ.
711712
@@ -719,7 +720,7 @@ where $X^+$ is the pseudo-inverse of $X$.
719720
720721
Formulas for the pseudo-inverse differ for our two cases.
721722
722-
When $ \tilde n > > m$, so that we have many more time series observations $\tilde n$ than variables $m$ and when
723+
When $ n > > m$, so that we have many more time series observations $n$ than variables $m$ and when
723724
$X$ has linearly independent **rows**, $X X^T$ has an inverse and the pseudo-inverse $X^+$ is
724725
725726
$$
@@ -743,14 +744,14 @@ This least-squares formula widely used in econometrics.
743744
744745
**Tall-Skinny Case:**
745746
746-
When $m > > \tilde n$, so that we have many more variables $m $ than time series observations $\tilde n$ and when $X$ has linearly independent **columns**,
747+
When $m > > n$, so that we have many more variables $m $ than time series observations $n$ and when $X$ has linearly independent **columns**,
747748
$X^T X$ has an inverse and the pseudo-inverse $X^+$ is
748749
749750
$$
750751
X^+ = (X^T X)^{-1} X^T
751752
$$
752753
753-
Here $X^+$ is a **left-inverse** that verifies $X^+ X = I_{\tilde n \times \tilde n}$.
754+
Here $X^+$ is a **left-inverse** that verifies $X^+ X = I_{n \times n}$.
754755
755756
In this case, our formula {eq}`eq:commonA` for a least-squares estimator of $A$ becomes
756757
@@ -773,21 +774,24 @@ Thus, we want to fit equation {eq}`eq:VARfirstorder` in a situation in which we
773774
variables that appear in the vector $X_t$.
774775
775776
776-
To reiterate and provide more detail about how we can efficiently calculate the pseudo-inverse $X^+$, as our estimator $\hat A$ of $A$ we form an $m \times m$ matrix that solves the least-squares best-fit problem
777+
To reiterate and offer an idea about how we can efficiently calculate the pseudo-inverse $X^+$, as our estimator $\hat A$ of $A$ we form an $m \times m$ matrix that solves the least-squares best-fit problem
777778
778779
$$
779780
\hat A = \textrm{argmin}_{\check A} || X' - \check A X ||_F
780781
$$ (eq:ALSeqn)
781782
782783
where $|| \cdot ||_F$ denotes the Frobeneus norm of a matrix.
783784
784-
The solution of the problem on the right side of equation {eq}`eq:ALSeqn` is
785+
The minimizer of the right side of equation {eq}`eq:ALSeqn` is
785786
786787
$$
787788
\hat A = X' X^{+}
788789
$$ (eq:hatAform)
789790
790-
where the (possibly huge) $ \tilde n \times m $ matrix $ X^{+} = (X^T X)^{-1} X^T$ is again a pseudo-inverse of $ X $.
791+
where the (possibly huge) $ n \times m $ matrix $ X^{+} = (X^T X)^{-1} X^T$ is again a pseudo-inverse of $ X $.
792+
793+
794+
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
791795
792796
For some situations that we are interested in, $X^T X $ can be close to singular, a situation that can make some numerical algorithms be error-prone.
793797
@@ -796,20 +800,16 @@ To confront that possibility, we'll use efficient algorithms for computing and
796800
797801
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
798802
799-
An efficient way to compute the pseudo-inverse $X^+$ is to start with the (reduced) singular value decomposition
803+
An efficient way to compute the pseudo-inverse $X^+$ is to start with a singular value decomposition
800804
801805
802806
803807
$$
804808
X = U \Sigma V^T
805809
$$ (eq:SVDDMD)
806810
807-
where $ U $ is $ m \times p $, $ \Sigma $ is a $ p \times p $ diagonal matrix, and $ V^T $ is a $ p \times \tilde n $ matrix.
808-
809-
Here $ p $ is the rank of $ X $, where necessarily $ p \leq \tilde n $ because we are in a situation in which $m > > \tilde n$.
810811
811-
812-
Since we are in the $m > > \tilde n$ case, we can use the singular value decomposition {eq}`eq:SVDDMD` efficiently to construct the pseudo-inverse $X^+$
812+
We can use the singular value decomposition {eq}`eq:SVDDMD` efficiently to construct the pseudo-inverse $X^+$
813813
by recognizing the following string of equalities.
814814
815815
$$
@@ -822,6 +822,10 @@ X^{+} & = (X^T X)^{-1} X^T \\
822822
\end{aligned}
823823
$$ (eq:efficientpseudoinverse)
824824
825+
826+
(Since we are in the $m > > n$ case in which $V^T V = I$ in a reduced SVD, we can use the preceding
827+
string of equalities for a reduced SVD as well as for a full SVD.)
828+
825829
Thus, we shall construct a pseudo-inverse $ X^+ $ of $ X $ by using
826830
a singular value decomposition of $X$ in equation {eq}`eq:SVDDMD` to compute
827831
@@ -840,22 +844,22 @@ $$
840844
\hat A = X' V \Sigma^{-1} U^T
841845
$$
842846
843-
In addition to doing that, we’ll eventually use **dynamic mode decomposition** to compute a rank $ r $ approximation to $ A $,
847+
In addition to doing that, we’ll eventually use **dynamic mode decomposition** to compute a rank $ r $ approximation to $ \hat A $,
844848
where $ r < p $.
845849
846850
**Remark:** We described and illustrated a **reduced** singular value decomposition above, and compared it with a **full** singular value decomposition.
847851
In our Python code, we'll typically use a reduced SVD.
848852
849853
850-
Next, we describe some alternative __reduced order__ representations of our first-order linear dynamic system.
854+
Next, we describe alternative representations of our first-order linear dynamic system.
851855
852856
+++
853857
854858
## Representation 1
855859
856-
In constructing this representation and also whenever we use it, we use a **full** SVD of $X$.
860+
In this representation, we shall use a **full** SVD of $X$.
857861
858-
We use the $p$ columns of $U$, and thus the $p$ rows of $U^T$, to define a $p \times 1$ vector $\tilde b_t$ as follows
862+
We use the $m$ columns of $U$, and thus the $m$ rows of $U^T$, to define a $m \times 1$ vector $\tilde b_t$ as follows
859863
860864
861865
$$
@@ -876,13 +880,13 @@ So it follows from equation {eq}`eq:tildeXdef2` that we can reconstruct $X_t$ f
876880
877881
878882
879-
* Equation {eq}`eq:tildeXdef2` serves as an **encoder** that summarizes the $m \times 1$ vector $X_t$ by a $p \times 1$ vector $\tilde b_t$
883+
* Equation {eq}`eq:tildeXdef2` serves as an **encoder** that rotates the $m \times 1$ vector $X_t$ to become an $m \times 1$ vector $\tilde b_t$
880884
881-
* Equation {eq}`eq:Xdecoder` serves as a **decoder** that recovers the $m \times 1$ vector $X_t$ from the $p \times 1$ vector $\tilde b_t$
885+
* Equation {eq}`eq:Xdecoder` serves as a **decoder** that recovers the $m \times 1$ vector $X_t$ by rotating the $m \times 1$ vector $\tilde b_t$
882886
883887
884888
885-
Define the transition matrix for a reduced $p \times 1$ state $\tilde b_t$ as
889+
Define a transition matrix for a rotated $m \times 1$ state $\tilde b_t$ by
886890
887891
$$
888892
\tilde A = U^T \hat A U
@@ -894,13 +898,14 @@ $$
894898
\hat A = U \tilde A U^T
895899
$$
896900
897-
Dynamics of the reduced $p \times 1$ state $\tilde b_t$ are governed by
901+
Dynamics of the rotated $m \times 1$ state $\tilde b_t$ are governed by
898902
899903
$$
900904
\tilde b_{t+1} = \tilde A \tilde b_t
901905
$$
902906
903-
To construct forecasts $\overline X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders to both sides of this
907+
To construct forecasts $\overline X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders
908+
(i.e., rotators) to both sides of this
904909
equation and deduce
905910
906911
$$
@@ -914,43 +919,45 @@ where we use $\overline X_t$ to denote a forecast.
914919
## Representation 2
915920
916921
917-
This representation is the one originally proposed by {cite}`schmid2010`.
922+
This representation is related to one originally proposed by {cite}`schmid2010`.
918923
919924
It can be regarded as an intermediate step to a related and perhaps more useful representation 3.
920925
921926
922927
As with Representation 1, we continue to
923928
924-
925-
* use all $p$ singular values of $X$
926929
* use a **full** SVD and **not** a reduced SVD
927930
928931
929932
930-
As we observed and illustrated earlier in this lecture, under these two requirements,
933+
As we observed and illustrated earlier in this lecture, for a full SVD
931934
$U U^T$ and $U^T U$ are both identity matrices; but under a reduced SVD of $X$, $U^T U$ is not an identity matrix.
932935
933-
As we shall see, these requirements will be too confining for what we ultimately want to do; these are situations in which $U^T U$ is **not** an identity matrix because we want to use a reduced SVD of $X$.
936+
As we shall see, a full SVD is too confining for what we ultimately want to do, namely, situations in which $U^T U$ is **not** an identity matrix because we use a reduced SVD of $X$.
934937
935938
But for now, let's proceed under the assumption that both of the preceding two requirements are satisfied.
936939
937940
938941
939-
Form an eigendecomposition of the $p \times p$ matrix $\tilde A$ defined in equation {eq}`eq:Atilde0`:
942+
Form an eigendecomposition of the $m \times m$ matrix $\tilde A = U^T \check A U$ defined in equation {eq}`eq:Atilde0`:
940943
941944
$$
942945
\tilde A = W \Lambda W^{-1}
943946
$$ (eq:tildeAeigen)
944947
945-
where $\Lambda$ is a diagonal matrix of eigenvalues and $W$ is a $p \times p$
948+
where $\Lambda$ is a diagonal matrix of eigenvalues and $W$ is an $m \times m$
946949
matrix whose columns are eigenvectors corresponding to rows (eigenvalues) in
947950
$\Lambda$.
948951
949-
Note that when $U U^T = I_{m \times m}$, as is true with a full SVD of X (but **not** true with a reduced SVD)
952+
Note that when $U U^T = I_{m \times m}$, as is true with a full SVD of $X$ (but as is **not** true with a reduced SVD)
950953
951954
$$
952955
\hat A = U \tilde A U^T = U W \Lambda W^{-1} U^T
953-
$$
956+
$$ (eq:eqeigAhat)
957+
958+
Evidently, according to equation {eq}`eq:eqeigAhat`, the diagonal matrix $\Lambda$ contains eigenvalues of
959+
$\hat A$ and corresponding eigenvectors of $\hat A$ are columns of the matrix $UW$.
960+
954961
955962
Thus, the systematic (i.e., not random) parts of the $X_t$ dynamics captured by our first-order vector autoregressions are described by
956963
@@ -982,15 +989,15 @@ $$
982989
X_t = U W \hat b_t
983990
$$
984991
985-
We can use this representation to constructor a predictor $\overline X_{t+1}$ of $X_{t+1}$ conditional on $X_1$ via:
992+
We can use this representation to construct a predictor $\overline X_{t+1}$ of $X_{t+1}$ conditional on $X_1$ via:
986993
987994
$$
988995
\overline X_{t+1} = U W \Lambda^t W^{-1} U^T X_1
989996
$$ (eq:DSSEbookrepr)
990997
991998
992999
In effect,
993-
{cite}`schmid2010` defined an $m \times p$ matrix $\Phi_s$ as
1000+
{cite}`schmid2010` defined an $m \times m$ matrix $\Phi_s$ as
9941001
9951002
$$
9961003
\Phi_s = UW
@@ -1005,37 +1012,45 @@ $$ (eq:schmidrep)
10051012
Components of the basis vector $ \hat b_t = W^{-1} U^T X_t \equiv \Phi_s^+$ are often called DMD **modes**, or sometimes also
10061013
DMD **projected nodes**.
10071014
1008-
An alternative definition of DMD notes is motivate by the following observation.
10091015
1010-
A peculiar feature of representation {eq}`eq:schmidrep` is that while the diagonal components of $\Lambda$ are square roots of singular
1011-
values of $\check A$, the columns of $\Phi_s$ are **not** eigenvectors corresponding to eigenvalues of $\check A$.
10121016
1013-
This feature led Tu et al. {cite}`tu_Rowley` to suggest an alternative representation that replaces $\Phi_s$ with another
1014-
$m \times p$ matrix whose columns are eigenvectors of $\check A$.
10151017
1016-
We turn to that representation next.
1018+
We turn next to an alternative representation suggested by Tu et al. {cite}`tu_Rowley`.
10171019
10181020
10191021
10201022
10211023
## Representation 3
10221024
10231025
1024-
As we did with representation 2, it is useful to construct an eigencomposition of the $p \times p$ transition matrix $\tilde A$
1026+
As we did with representation 2, it is useful to construct an eigendecomposition of the $m \times m$ transition matrix $\tilde A$
10251027
according the equation {eq}`eq:tildeAeigen`.
10261028
10271029
1028-
Now where $ 1 \leq r \leq p$, construct an $m \times r$ matrix
1030+
Departing from the procedures used to construct Representations 1 and 2, each of which deployed a **full** SVD, we now use a **reduced** SVD.
1031+
1032+
As above, we let $p \leq \textrm{min}(m,n)$ be the rank of $X$ and consider a **reduced** SVD
1033+
1034+
$$
1035+
X = U \Sigma V^T
1036+
$$
1037+
1038+
where now $U$ is $m \times p$ and $\Sigma$ is $ p \times p$ and $V^T$ is $p \times n$.
1039+
1040+
1041+
1042+
1043+
Construct an $m \times p$ matrix
10291044
10301045
$$
1031-
\Phi = X' V \Sigma^{-1} W
1046+
\Phi = X' V \Sigma^{-1} W
10321047
$$ (eq:Phiformula)
10331048
10341049
10351050
10361051
Tu et al. {cite}`tu_Rowley` established the following
10371052
1038-
**Proposition** The $r$ columns of $\Phi$ are eigenvectors of $\check A$ that correspond to the largest $r$ eigenvalues of $A$.
1053+
**Proposition** The $p$ columns of $\Phi$ are eigenvectors of $\check A$.
10391054
10401055
**Proof:** From formula {eq}`eq:Phiformula` we have
10411056
@@ -1074,7 +1089,7 @@ We also have the following
10741089
{eq}`eq:Atilde0`, define it as the following $r \times r$ counterpart
10751090
10761091
$$
1077-
\tilde A = \tilde U^T \hat A U
1092+
\tilde A = \tilde U^T \hat A \tilde U
10781093
$$ (eq:Atilde10)
10791094
10801095
where in equation {eq}`eq:Atilde10` $\tilde U$ is now the $m \times r$ matrix consisting of the eigevectors of $X X^T$ corresponding to the $r$
@@ -1203,7 +1218,7 @@ the $r < p$ largest singular values.
12031218
12041219
In that case, we simply replace $\Sigma$ with the appropriate $r \times r$ matrix of singular values,
12051220
$U$ with the $m \times r$ matrix of whose columns correspond to the $r$ largest singular values,
1206-
and $V$ with the $\tilde n \times r$ matrix whose columns correspond to the $r$ largest singular values.
1221+
and $V$ with the $n \times r$ matrix whose columns correspond to the $r$ largest singular values.
12071222
12081223
Counterparts of all of the salient formulas above then apply.
12091224

0 commit comments

Comments
 (0)