You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/svd_intro.md
+74-59Lines changed: 74 additions & 59 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,13 +108,11 @@ In what is called a **full** SVD, the shapes of $U$, $\Sigma$, and $V$ are $\le
108
108
109
109
There is also an alternative shape convention called an **economy** or **reduced** SVD .
110
110
111
-
Thus, note that because we assume that $A$ has rank $r$, there are only $r $ nonzero singular values, where $r=\textrm{rank}(A)\leq\min\left(m, n\right)$.
111
+
Thus, note that because we assume that $X$ has rank $r$, there are only $r $ nonzero singular values, where $r=\textrm{rank}(X)\leq\min\left(m, n\right)$.
112
112
113
113
A **reduced** SVD uses this fact to express $U$, $\Sigma$, and $V$ as matrices with shapes $\left(m, r\right)$, $\left(r, r\right)$, $\left(r, n\right)$.
114
114
115
-
Sometimes, we will use a full SVD
116
-
117
-
At other times, we'll use a reduced SVD in which $\Sigma$ is an $r \times r$ diagonal matrix.
115
+
Sometimes, we will use a **full** SVD in which $U$, $\Sigma$, and $V$ have shapes $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$
118
116
119
117
120
118
**Caveat:**
@@ -652,6 +650,9 @@ $$
652
650
X_{t+1} = A X_t + C \epsilon_{t+1}
653
651
$$ (eq:VARfirstorder)
654
652
653
+
where $\epsilon_{t+1}$ is the time $t+1$ instance of an i.i.d. $m \times 1$ random vector with mean vector
654
+
zero and identity covariance matrix and
655
+
655
656
where
656
657
the $ m \times 1 $ vector $ X_t $ is
657
658
@@ -666,46 +667,46 @@ and where $ T $ again denotes complex transposition and $ X_{i,t} $ is an observ
666
667
We want to fit equation {eq}`eq:VARfirstorder`.
667
668
668
669
669
-
Our data is assembled in the form of an $ m \times n $ matrix $ \tilde X $
670
+
Our data are organized in an $ m \times (n+1) $ matrix $ \tilde X $
where for $ t = 1, \ldots, n $, the $ m \times 1 $ vector $ X_t $ is given by {eq}`eq:Xvector`.
676
+
where for $ t = 1, \ldots, n +1 $, the $ m \times 1 $ vector $ X_t $ is given by {eq}`eq:Xvector`.
676
677
677
-
We want to estimate system {eq}`eq:VARfirstorder` consisting of $ m $ least squares regressions of **everything** on one lagged value of **everything**.
678
+
Thus, we want to estimate a system {eq}`eq:VARfirstorder` that consists of $ m $ least squares regressions of **everything** on one lagged value of **everything**.
678
679
679
680
The $i$'th equation of {eq}`eq:VARfirstorder` is a regression of $X_{i,t+1}$ on the vector $X_t$.
680
681
681
682
682
683
We proceed as follows.
683
684
684
685
685
-
From $ \tilde X $, we form two matrices
686
+
From $ \tilde X $, we form two $m \times n$ matrices
686
687
687
688
$$
688
-
X = \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_{n-1}\end{bmatrix}
689
+
X = \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_{n}\end{bmatrix}
Here $ ' $ does not indicate matrix transposition but instead is part of the name of the matrix $ X' $.
698
699
699
700
In forming $ X $ and $ X' $, we have in each case dropped a column from $ \tilde X $, the last column in the case of $ X $, and the first column in the case of $ X' $.
700
701
701
-
Evidently, $ X $ and $ X' $ are both $ m \times \tilde n $ matrices where $ \tilde n = n - 1 $.
702
+
Evidently, $ X $ and $ X' $ are both $ m \times n $ matrices.
702
703
703
-
We denote the rank of $ X $ as $ p \leq \min(m, \tilde n) $.
704
+
We denote the rank of $ X $ as $ p \leq \min(m, n) $.
704
705
705
706
Two possible cases are
706
707
707
-
* $ \tilde n > > m$, so that we have many more time series observations $\tilde n$ than variables $m$
708
-
* $m > > \tilde n$, so that we have many more variables $m $ than time series observations $\tilde n$
708
+
* $ n > > m$, so that we have many more time series observations $n$ than variables $m$
709
+
* $m > > n$, so that we have many more variables $m $ than time series observations $n$
709
710
710
711
At a general level that includes both of these special cases, a common formula describes the least squares estimator $\hat A$ of $A$ for both cases, but important details differ.
711
712
@@ -719,7 +720,7 @@ where $X^+$ is the pseudo-inverse of $X$.
719
720
720
721
Formulas for the pseudo-inverse differ for our two cases.
721
722
722
-
When $ \tilde n > > m$, so that we have many more time series observations $\tilde n$ than variables $m$ and when
723
+
When $ n > > m$, so that we have many more time series observations $n$ than variables $m$ and when
723
724
$X$ has linearly independent **rows**, $X X^T$ has an inverse and the pseudo-inverse $X^+$ is
724
725
725
726
$$
@@ -743,14 +744,14 @@ This least-squares formula widely used in econometrics.
743
744
744
745
**Tall-Skinny Case:**
745
746
746
-
When $m > > \tilde n$, so that we have many more variables $m $ than time series observations $\tilde n$ and when $X$ has linearly independent **columns**,
747
+
When $m > > n$, so that we have many more variables $m $ than time series observations $n$ and when $X$ has linearly independent **columns**,
747
748
$X^T X$ has an inverse and the pseudo-inverse $X^+$ is
748
749
749
750
$$
750
751
X^+ = (X^T X)^{-1} X^T
751
752
$$
752
753
753
-
Here $X^+$ is a **left-inverse** that verifies $X^+ X = I_{\tilde n \times \tilde n}$.
754
+
Here $X^+$ is a **left-inverse** that verifies $X^+ X = I_{n \times n}$.
754
755
755
756
In this case, our formula {eq}`eq:commonA` for a least-squares estimator of $A$ becomes
756
757
@@ -773,21 +774,24 @@ Thus, we want to fit equation {eq}`eq:VARfirstorder` in a situation in which we
773
774
variables that appear in the vector $X_t$.
774
775
775
776
776
-
To reiterate and provide more detail about how we can efficiently calculate the pseudo-inverse $X^+$, as our estimator $\hat A$ of $A$ we form an $m \times m$ matrix that solves the least-squares best-fit problem
777
+
To reiterate and offer an idea about how we can efficiently calculate the pseudo-inverse $X^+$, as our estimator $\hat A$ of $A$ we form an $m \times m$ matrix that solves the least-squares best-fit problem
777
778
778
779
$$
779
780
\hat A = \textrm{argmin}_{\check A} || X' - \check A X ||_F
780
781
$$ (eq:ALSeqn)
781
782
782
783
where $|| \cdot ||_F$ denotes the Frobeneus norm of a matrix.
783
784
784
-
The solution of the problem on the right side of equation {eq}`eq:ALSeqn` is
785
+
The minimizer of the right side of equation {eq}`eq:ALSeqn` is
785
786
786
787
$$
787
788
\hat A = X' X^{+}
788
789
$$ (eq:hatAform)
789
790
790
-
where the (possibly huge) $ \tilde n \times m $ matrix $ X^{+} = (X^T X)^{-1} X^T$ is again a pseudo-inverse of $ X $.
791
+
where the (possibly huge) $ n \times m $ matrix $ X^{+} = (X^T X)^{-1} X^T$ is again a pseudo-inverse of $ X $.
792
+
793
+
794
+
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
791
795
792
796
For some situations that we are interested in, $X^T X $ can be close to singular, a situation that can make some numerical algorithms be error-prone.
793
797
@@ -796,20 +800,16 @@ To confront that possibility, we'll use efficient algorithms for computing and
796
800
797
801
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
798
802
799
-
An efficient way to compute the pseudo-inverse $X^+$ is to start with the (reduced) singular value decomposition
803
+
An efficient way to compute the pseudo-inverse $X^+$ is to start with a singular value decomposition
800
804
801
805
802
806
803
807
$$
804
808
X = U \Sigma V^T
805
809
$$ (eq:SVDDMD)
806
810
807
-
where $ U $ is $ m \times p $, $ \Sigma $ is a $ p \times p $ diagonal matrix, and $ V^T $ is a $ p \times \tilde n $ matrix.
808
-
809
-
Here $ p $ is the rank of $ X $, where necessarily $ p \leq \tilde n $ because we are in a situation in which $m > > \tilde n$.
810
811
811
-
812
-
Since we are in the $m > > \tilde n$ case, we can use the singular value decomposition {eq}`eq:SVDDMD` efficiently to construct the pseudo-inverse $X^+$
812
+
We can use the singular value decomposition {eq}`eq:SVDDMD` efficiently to construct the pseudo-inverse $X^+$
813
813
by recognizing the following string of equalities.
(Since we are in the $m > > n$ case in which $V^T V = I$ in a reduced SVD, we can use the preceding
827
+
string of equalities for a reduced SVD as well as for a full SVD.)
828
+
825
829
Thus, we shall construct a pseudo-inverse $ X^+ $ of $ X $ by using
826
830
a singular value decomposition of $X$ in equation {eq}`eq:SVDDMD` to compute
827
831
@@ -840,22 +844,22 @@ $$
840
844
\hat A = X' V \Sigma^{-1} U^T
841
845
$$
842
846
843
-
In addition to doing that, we’ll eventually use **dynamic mode decomposition** to compute a rank $ r $ approximation to $ A $,
847
+
In addition to doing that, we’ll eventually use **dynamic mode decomposition** to compute a rank $ r $ approximation to $ \hat A $,
844
848
where $ r < p $.
845
849
846
850
**Remark:** We described and illustrated a **reduced** singular value decomposition above, and compared it with a **full** singular value decomposition.
847
851
In our Python code, we'll typically use a reduced SVD.
848
852
849
853
850
-
Next, we describe some alternative __reduced order__ representations of our first-order linear dynamic system.
854
+
Next, we describe alternative representations of our first-order linear dynamic system.
851
855
852
856
+++
853
857
854
858
## Representation 1
855
859
856
-
In constructing this representation and also whenever we use it, we use a **full** SVD of $X$.
860
+
In this representation, we shall use a **full** SVD of $X$.
857
861
858
-
We use the $p$ columns of $U$, and thus the $p$ rows of $U^T$, to define a $p \times 1$ vector $\tilde b_t$ as follows
862
+
We use the $m$ columns of $U$, and thus the $m$ rows of $U^T$, to define a $m \times 1$ vector $\tilde b_t$ as follows
859
863
860
864
861
865
$$
@@ -876,13 +880,13 @@ So it follows from equation {eq}`eq:tildeXdef2` that we can reconstruct $X_t$ f
876
880
877
881
878
882
879
-
* Equation {eq}`eq:tildeXdef2` serves as an **encoder** that summarizes the $m \times 1$ vector $X_t$ by a $p \times 1$ vector $\tilde b_t$
883
+
* Equation {eq}`eq:tildeXdef2` serves as an **encoder** that rotates the $m \times 1$ vector $X_t$ to become an $m \times 1$ vector $\tilde b_t$
880
884
881
-
* Equation {eq}`eq:Xdecoder` serves as a **decoder** that recovers the $m \times 1$ vector $X_t$ from the $p \times 1$ vector $\tilde b_t$
885
+
* Equation {eq}`eq:Xdecoder` serves as a **decoder** that recovers the $m \times 1$ vector $X_t$ by rotating the $m \times 1$ vector $\tilde b_t$
882
886
883
887
884
888
885
-
Define the transition matrix for a reduced $p \times 1$ state $\tilde b_t$ as
889
+
Define a transition matrix for a rotated $m \times 1$ state $\tilde b_t$ by
886
890
887
891
$$
888
892
\tilde A = U^T \hat A U
@@ -894,13 +898,14 @@ $$
894
898
\hat A = U \tilde A U^T
895
899
$$
896
900
897
-
Dynamics of the reduced $p \times 1$ state $\tilde b_t$ are governed by
901
+
Dynamics of the rotated $m \times 1$ state $\tilde b_t$ are governed by
898
902
899
903
$$
900
904
\tilde b_{t+1} = \tilde A \tilde b_t
901
905
$$
902
906
903
-
To construct forecasts $\overline X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders to both sides of this
907
+
To construct forecasts $\overline X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders
908
+
(i.e., rotators) to both sides of this
904
909
equation and deduce
905
910
906
911
$$
@@ -914,43 +919,45 @@ where we use $\overline X_t$ to denote a forecast.
914
919
## Representation 2
915
920
916
921
917
-
This representation is the one originally proposed by {cite}`schmid2010`.
922
+
This representation is related to one originally proposed by {cite}`schmid2010`.
918
923
919
924
It can be regarded as an intermediate step to a related and perhaps more useful representation 3.
920
925
921
926
922
927
As with Representation 1, we continue to
923
928
924
-
925
-
* use all $p$ singular values of $X$
926
929
* use a **full** SVD and **not** a reduced SVD
927
930
928
931
929
932
930
-
As we observed and illustrated earlier in this lecture, under these two requirements,
933
+
As we observed and illustrated earlier in this lecture, for a full SVD
931
934
$U U^T$ and $U^T U$ are both identity matrices; but under a reduced SVD of $X$, $U^T U$ is not an identity matrix.
932
935
933
-
As we shall see, these requirements will be too confining for what we ultimately want to do; these are situations in which $U^T U$ is **not** an identity matrix because we want to use a reduced SVD of $X$.
936
+
As we shall see, a full SVD is too confining for what we ultimately want to do, namely, situations in which $U^T U$ is **not** an identity matrix because we use a reduced SVD of $X$.
934
937
935
938
But for now, let's proceed under the assumption that both of the preceding two requirements are satisfied.
936
939
937
940
938
941
939
-
Form an eigendecomposition of the $p \times p$ matrix $\tilde A$ defined in equation {eq}`eq:Atilde0`:
942
+
Form an eigendecomposition of the $m \times m$ matrix $\tilde A = U^T \check A U$ defined in equation {eq}`eq:Atilde0`:
940
943
941
944
$$
942
945
\tilde A = W \Lambda W^{-1}
943
946
$$ (eq:tildeAeigen)
944
947
945
-
where $\Lambda$ is a diagonal matrix of eigenvalues and $W$ is a $p \times p$
948
+
where $\Lambda$ is a diagonal matrix of eigenvalues and $W$ is an $m \times m$
946
949
matrix whose columns are eigenvectors corresponding to rows (eigenvalues) in
947
950
$\Lambda$.
948
951
949
-
Note that when $U U^T = I_{m \times m}$, as is true with a full SVD of X (but **not** true with a reduced SVD)
952
+
Note that when $U U^T = I_{m \times m}$, as is true with a full SVD of $X$ (but as is **not** true with a reduced SVD)
950
953
951
954
$$
952
955
\hat A = U \tilde A U^T = U W \Lambda W^{-1} U^T
953
-
$$
956
+
$$ (eq:eqeigAhat)
957
+
958
+
Evidently, according to equation {eq}`eq:eqeigAhat`, the diagonal matrix $\Lambda$ contains eigenvalues of
959
+
$\hat A$ and corresponding eigenvectors of $\hat A$ are columns of the matrix $UW$.
960
+
954
961
955
962
Thus, the systematic (i.e., not random) parts of the $X_t$ dynamics captured by our first-order vector autoregressions are described by
956
963
@@ -982,15 +989,15 @@ $$
982
989
X_t = U W \hat b_t
983
990
$$
984
991
985
-
We can use this representation to constructor a predictor $\overline X_{t+1}$ of $X_{t+1}$ conditional on $X_1$ via:
992
+
We can use this representation to construct a predictor $\overline X_{t+1}$ of $X_{t+1}$ conditional on $X_1$ via:
986
993
987
994
$$
988
995
\overline X_{t+1} = U W \Lambda^t W^{-1} U^T X_1
989
996
$$ (eq:DSSEbookrepr)
990
997
991
998
992
999
In effect,
993
-
{cite}`schmid2010` defined an $m \times p$ matrix $\Phi_s$ as
1000
+
{cite}`schmid2010` defined an $m \times m$ matrix $\Phi_s$ as
994
1001
995
1002
$$
996
1003
\Phi_s = UW
@@ -1005,37 +1012,45 @@ $$ (eq:schmidrep)
1005
1012
Components of the basis vector $ \hat b_t = W^{-1} U^T X_t \equiv \Phi_s^+$ are often called DMD **modes**, or sometimes also
1006
1013
DMD **projected nodes**.
1007
1014
1008
-
An alternative definition of DMD notes is motivate by the following observation.
1009
1015
1010
-
A peculiar feature of representation {eq}`eq:schmidrep` is that while the diagonal components of $\Lambda$ are square roots of singular
1011
-
values of $\check A$, the columns of $\Phi_s$ are **not** eigenvectors corresponding to eigenvalues of $\check A$.
1012
1016
1013
-
This feature led Tu et al. {cite}`tu_Rowley` to suggest an alternative representation that replaces $\Phi_s$ with another
1014
-
$m \times p$ matrix whose columns are eigenvectors of $\check A$.
1015
1017
1016
-
We turn to that representation next.
1018
+
We turn next to an alternative representation suggested by Tu et al. {cite}`tu_Rowley`.
1017
1019
1018
1020
1019
1021
1020
1022
1021
1023
## Representation 3
1022
1024
1023
1025
1024
-
As we did with representation 2, it is useful to construct an eigencomposition of the $p \times p$ transition matrix $\tilde A$
1026
+
As we did with representation 2, it is useful to construct an eigendecomposition of the $m \times m$ transition matrix $\tilde A$
1025
1027
according the equation {eq}`eq:tildeAeigen`.
1026
1028
1027
1029
1028
-
Now where $ 1 \leq r \leq p$, construct an $m \times r$ matrix
1030
+
Departing from the procedures used to construct Representations 1 and 2, each of which deployed a **full** SVD, we now use a **reduced** SVD.
1031
+
1032
+
As above, we let $p \leq \textrm{min}(m,n)$ be the rank of $X$ and consider a **reduced** SVD
1033
+
1034
+
$$
1035
+
X = U \Sigma V^T
1036
+
$$
1037
+
1038
+
where now $U$ is $m \times p$ and $\Sigma$ is $ p \times p$ and $V^T$ is $p \times n$.
1039
+
1040
+
1041
+
1042
+
1043
+
Construct an $m \times p$ matrix
1029
1044
1030
1045
$$
1031
-
\Phi = X' V \Sigma^{-1} W
1046
+
\Phi = X' V \Sigma^{-1} W
1032
1047
$$ (eq:Phiformula)
1033
1048
1034
1049
1035
1050
1036
1051
Tu et al. {cite}`tu_Rowley` established the following
1037
1052
1038
-
**Proposition** The $r$ columns of $\Phi$ are eigenvectors of $\check A$ that correspond to the largest $r$ eigenvalues of $A$.
1053
+
**Proposition** The $p$ columns of $\Phi$ are eigenvectors of $\check A$.
1039
1054
1040
1055
**Proof:** From formula {eq}`eq:Phiformula` we have
1041
1056
@@ -1074,7 +1089,7 @@ We also have the following
1074
1089
{eq}`eq:Atilde0`, define it as the following $r \times r$ counterpart
1075
1090
1076
1091
$$
1077
-
\tilde A = \tilde U^T \hat A U
1092
+
\tilde A = \tilde U^T \hat A \tilde U
1078
1093
$$ (eq:Atilde10)
1079
1094
1080
1095
where in equation {eq}`eq:Atilde10` $\tilde U$ is now the $m \times r$ matrix consisting of the eigevectors of $X X^T$ corresponding to the $r$
@@ -1203,7 +1218,7 @@ the $r < p$ largest singular values.
1203
1218
1204
1219
In that case, we simply replace $\Sigma$ with the appropriate $r \times r$ matrix of singular values,
1205
1220
$U$ with the $m \times r$ matrix of whose columns correspond to the $r$ largest singular values,
1206
-
and $V$ with the $\tilde n \times r$ matrix whose columns correspond to the $r$ largest singular values.
1221
+
and $V$ with the $n \times r$ matrix whose columns correspond to the $r$ largest singular values.
1207
1222
1208
1223
Counterparts of all of the salient formulas above then apply.
0 commit comments