You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/svd_intro.md
+89-5Lines changed: 89 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -595,10 +595,12 @@ $$ (eq:Xvector)
595
595
596
596
and where $ T $ again denotes complex transposition and $ X_{i,t} $ is an observation on variable $ i $ at time $ t $.
597
597
598
-
We want to fit equation {eq}`eq:VARfirstorder` in a situation in which we have a number $n$ of observations that is small relative to the number $m$ of
599
-
variables that appear in the vector $X_t$.
600
598
601
-
In particular, our data takes the form of an $ m \times n $ matrix o $ \tilde X $
599
+
600
+
We want to fit equation {eq}`eq:VARfirstorder`.
601
+
602
+
603
+
Our data is assembled in the form of an $ m \times n $ matrix $ \tilde X $
@@ -609,6 +611,8 @@ where for $ t = 1, \ldots, n $, the $ m \times 1 $ vector $ X_t $ is given by {
609
611
We want to estimate system {eq}`eq:VARfirstorder` consisting of $ m $ least squares regressions of **everything** on one lagged value of **everything**.
610
612
611
613
614
+
615
+
612
616
We proceed as follows.
613
617
614
618
@@ -630,9 +634,73 @@ In forming $ X $ and $ X' $, we have in each case dropped a column from $ \tild
630
634
631
635
Evidently, $ X $ and $ X' $ are both $ m \times \tilde n $ matrices where $ \tilde n = n - 1 $.
632
636
633
-
We denote the rank of $ X $ as $ p \leq \min(m, \tilde n) = \tilde n $.
637
+
We denote the rank of $ X $ as $ p \leq \min(m, \tilde n) $.
638
+
639
+
Two possible cases are when
640
+
641
+
* $ \tilde n > > m$, so that we have many more time series observations $\tilde n$ than variables $m$
642
+
* $m > > \tilde n$, so that we have many more variables $m $ than time series observations $\tilde n$
643
+
644
+
At a general level that includes both of these special cases, a common formula describes the least squares estimator $\hat A$ of $A$ for both cases, but important details differ.
645
+
646
+
The common formula is
647
+
648
+
$$ \hat A = X' X^+ $$
649
+
650
+
where $X^+$ is the pseudo-inverse of $X$.
651
+
652
+
Formulas for the pseudo-inverse differ for our two cases.
634
653
635
-
As our estimator $\hat A$ of $A$ we form an $m \times m$ matrix that solves the least-squares best-fit problem
654
+
When $ \tilde n > > m$, so that we have many more time series observations $\tilde n$ than variables $m$ and when
655
+
$X$ has linearly independent **rows**, $X X^T$ has an inverse and the pseudo-inverse $X^+$ is
656
+
657
+
$$
658
+
X^+ = X^T (X X^T)^{-1}
659
+
$$
660
+
661
+
Here $X^+$ is a **right-inverse** that verifies $ X X^+ = I_{m \times m}$.
662
+
663
+
In this case, our formula for the least-squares estimator of $A$ becomes
664
+
665
+
$$
666
+
\hat A = X' X^T (X X^T)^{-1}
667
+
$$
668
+
669
+
This is formula is widely used in economics to estimate vector autorgressions.
670
+
671
+
The left side is proportional to the empirical cross second moment matrix of $X_{t+1}$ and $X_t$ times the inverse
672
+
of the second moment matrix of $X_t$, the least-squares formula widely used in econometrics.
673
+
674
+
675
+
676
+
When $m > > \tilde n$, so that we have many more variables $m $ than time series observations $\tilde n$ and when $X$ has linearly independent **columns**,
677
+
$X^T X$ has an inverse and the pseudo-inverse $X^+$ is
678
+
679
+
$$
680
+
X^+ = (X^T X)^{-1} X^T
681
+
$$
682
+
683
+
Here $X^+$ is a **left-inverse** that verifies $X^+ X = I_{\tilde n \times \tilde n}$.
684
+
685
+
In this case, our formula for a least-squares estimator of $A$ becomes
686
+
687
+
$$
688
+
\hat A = X' (X^T X)^{-1} X^T
689
+
$$ (eq:hatAversion0)
690
+
691
+
This is the case that we are interested in here.
692
+
693
+
694
+
Thus, we want to fit equation {eq}`eq:VARfirstorder` in a situation in which we have a number $n$ of observations that is small relative to the number $m$ of
695
+
variables that appear in the vector $X_t$.
696
+
697
+
We'll use efficient algorithms for computing and for constructing reduced rank approximations of $\hat A$ in formula {eq}`eq:hatAversion0`.
698
+
699
+
700
+
701
+
702
+
703
+
To reiterate and supply more detail about how we can efficiently calculate the pseudo-inverse $X^+$, as our estimator $\hat A$ of $A$ we form an $m \times m$ matrix that solves the least-squares best-fit problem
636
704
637
705
$$
638
706
\hat A = \textrm{argmin}_{\check A} || X' - \check A X ||_F
@@ -706,6 +774,8 @@ $$
706
774
X_t - U \tilde b_t
707
775
$$ (eq:Xdecoder)
708
776
777
+
(Here we use $b$ to remind us that we are creating a **basis** vector.)
778
+
709
779
Since $U U^T$ is an $m \times m$ identity matrix, it follows from equation {eq}`eq:tildeXdef2` that we can reconstruct $X_t$ from $\tilde b_t$ by using
710
780
711
781
@@ -801,6 +871,20 @@ $$
801
871
802
872
803
873
874
+
## Using Fewer Modes
875
+
876
+
The preceding formulas assume that we have retained all $p$ modes associated with the positive
877
+
singular values of $X$.
878
+
879
+
We can easily adapt all of the formulas to describe a situation in which we instead retain only
880
+
the $r < p$ largest singular values.
881
+
882
+
In that case, we simply replace $\Sigma$ with the appropriate $r \times r$ matrix of singular values,
883
+
$U$ with the $m \times r$ matrix of whose columns correspond to the $r$ largest singular values,
884
+
and $V$ with the $\tilde n \times r$ matrix whose columns correspond to the $r$ largest singular values.
885
+
886
+
Counterparts of all of the salient formulas above then apply.
0 commit comments