Skip to content

Commit 8ee128e

Browse files
Tom's March 4 edits of two lectures
1 parent 88a2bb7 commit 8ee128e

File tree

2 files changed

+74
-34
lines changed

2 files changed

+74
-34
lines changed

lectures/samuelson.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ kernelspec:
1717
</div>
1818
```
1919

20-
# Application: The Samuelson Multiplier-Accelerator
20+
# Samuelson Multiplier-Accelerator
2121

2222
```{contents} Contents
2323
:depth: 2

lectures/svd_intro.md

Lines changed: 73 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -40,20 +40,22 @@ This lecture describes the singular value decomposition and two of its uses:
4040

4141
* dynamic mode decomposition (DMD)
4242

43-
Each of these can be thought of as data-reduction methods that are designed to capture principal patterns in data by projecting data onto a limited set of factors.
43+
Each of these can be thought of as data-reduction methods that are designed to capture salient patterns in data by projecting data onto a limited set of factors.
4444

4545
## The Setup
4646

4747
Let $X$ be an $m \times n$ matrix of rank $r$.
4848

49+
Necessarily, $r \leq \min(m,n)$.
50+
4951
In this lecture, we'll think of $X$ as a matrix of **data**.
5052

5153
* each column is an **individual** -- a time period or person, depending on the application
5254

5355
* each row is a **random variable** measuring an attribute of a time period or a person, depending on the application
5456

5557

56-
We'll be interested in two distinct cases
58+
We'll be interested in two cases
5759

5860
* A **short and fat** case in which $m << n$, so that there are many more columns than rows.
5961

@@ -64,7 +66,7 @@ We'll apply a **singular value decomposition** of $X$ in both situations.
6466

6567
In the first case in which there are many more observations $n$ than random variables $m$, we learn about the joint distribution of the random variables by taking averages across observations of functions of the observations.
6668

67-
Here we'll look for **patterns** by using a **singular value decomosition** to do a **principal components analysis** (PCA).
69+
Here we'll look for **patterns** by using a **singular value decomposition** to do a **principal components analysis** (PCA).
6870

6971
In the second case in which there are many more random variables $m$ than observations $n$, we'll proceed in a different way.
7072

@@ -91,7 +93,7 @@ where
9193

9294
* $V$ is an $n \times n$ matrix whose columns are eigenvectors of $X X^T$
9395

94-
* $\Sigma$ is an $m \times r$ matrix in which the first $r$ places on its main diagonal are positive numbers $\sigma_1, \sigma_2, \ldots, \sigma_r$ called **singular values**; remaining entries of $\Sigma$ are all zero
96+
* $\Sigma$ is an $m \times n$ matrix in which the first $r$ places on its main diagonal are positive numbers $\sigma_1, \sigma_2, \ldots, \sigma_r$ called **singular values**; remaining entries of $\Sigma$ are all zero
9597

9698
* The $r$ singular values are square roots of the eigenvalues of the $m \times m$ matrix $X X^T$ and the $n \times n$ matrix $X^T X$
9799

@@ -104,13 +106,15 @@ The shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)
104106

105107
Below, we shall assume these shapes.
106108

107-
However, though we chose not to, there is an alternative shape convention that we could have used.
109+
The above description corresponds to a standard shape convention often called a **full** SVD.
110+
111+
There is an alternative shape convention called **economy** or **reduced** SVD that we could have used, and will sometimes use below.
108112

109113
Thus, note that because we assume that $A$ has rank $r$, there are only $r $ nonzero singular values, where $r=\textrm{rank}(A)\leq\min\left(m, n\right)$.
110114

111115
Therefore, we could also write $U$, $\Sigma$, and $V$ as matrices with shapes $\left(m, r\right)$, $\left(r, r\right)$, $\left(r, n\right)$.
112116

113-
Sometimes, we will choose the former one to be consistent with what is adopted by `numpy`.
117+
Sometimes, we will choose the former convention.
114118

115119
At other times, we'll use the latter convention in which $\Sigma$ is an $r \times r$ diagonal matrix.
116120

@@ -133,7 +137,7 @@ where $S$ is evidently a symmetric matrix and $Q$ is an orthogonal matrix.
133137

134138
Let's begin with a case in which $n >> m$, so that we have many more observations $n$ than random variables $m$.
135139

136-
The data matrix $X$ is **short and fat** in an $n >> m$ case as opposed to a **tall and skinny** case with $m > > n $ to be discussed later in this lecture.
140+
The data matrix $X$ is **short and fat** in an $n >> m$ case as opposed to a **tall and skinny** case with $m > > n $ to be discussed later.
137141

138142
We regard $X$ as an $m \times n$ matrix of **data**:
139143

@@ -194,10 +198,22 @@ is a vector of loadings of variables $X_i$ on the $k$th principle component, $i
194198
195199
## Reduced Versus Full SVD
196200
201+
In a **full** SVD
202+
203+
* $U$ is $m \times m$
204+
* $\Sigma$ is $m \times n$
205+
* $V$ is $n \times n$
206+
207+
In a **reduced** SVD
208+
209+
* $U$ is $m \times r$
210+
* $\Sigma$ is $r \times r$
211+
* $V$ is $n \times r$
212+
197213
You can read about reduced and full SVD here
198214
<https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html>
199215
200-
Let's do a small experiment to see the difference
216+
Let's do a couple of small experiments to see the difference
201217
202218
```{code-cell} ipython3
203219
import numpy as np
@@ -222,10 +238,28 @@ an optimal reduced rank approximation of a matrix, in the sense of minimizing t
222238
norm of the discrepancy between the approximating matrix and the matrix being approximated.
223239
Optimality in this sense is established in the celebrated Eckart–Young theorem. See <https://en.wikipedia.org/wiki/Low-rank_approximation>.
224240
241+
Let's do another experiment now.
242+
243+
```{code-cell} ipython3
244+
import numpy as np
245+
X = np.random.rand(2,5)
246+
U, S, V = np.linalg.svd(X,full_matrices=True) # full SVD
247+
Uhat, Shat, Vhat = np.linalg.svd(X,full_matrices=False) # economy SVD
248+
print('U, S, V ='), U, S, V
249+
```
250+
251+
```{code-cell} ipython3
252+
print('Uhat, Shat, Vhat = '), Uhat, Shat, Vhat
253+
```
254+
255+
```{code-cell} ipython3
256+
rr = np.linalg.matrix_rank(X)
257+
rr
258+
```
225259
226260
## PCA with Eigenvalues and Eigenvectors
227261
228-
We now turn to using the eigen decomposition of a sample covariance matrix to do PCA.
262+
We now use an eigen decomposition of a sample covariance matrix to do PCA.
229263
230264
Let $X_{m \times n}$ be our $m \times n$ data matrix.
231265
@@ -311,9 +345,7 @@ provided that we set
311345
312346
Since there are several possible ways of computing $P$ and $U$ for given a data matrix $X$, depending on algorithms used, we might have sign differences or different orders between eigenvectors.
313347
314-
We want a way that leads to the same $U$ and $P$.
315-
316-
In the following, we accomplish this by
348+
We resolve such ambiguities about $U$ and $P$ by
317349
318350
1. sorting eigenvalues and singular values in descending order
319351
2. imposing positive diagonals on $P$ and $U$ and adjusting signs in $V^T$ accordingly
@@ -528,7 +560,7 @@ def compare_pca_svd(da):
528560
529561
## Dynamic Mode Decomposition (DMD)
530562
531-
We now turn to the case in which $m >>n $ so that there are many more random variables $m$ than observations $n$.
563+
We now turn to the case in which $m >>n$ in which an $m \times n$ data matrix $\tilde X$ contains many more random variables $m$ than observations $n$.
532564
533565
This is the **tall and skinny** case associated with **Dynamic Mode Decomposition**.
534566
@@ -545,7 +577,7 @@ where for $t = 1, \ldots, n$, the $m \times 1 $ vector $X_t$ is
545577
546578
$$ X_t = \begin{bmatrix} X_{1,t} & X_{2,t} & \cdots & X_{m,t} \end{bmatrix}^T $$
547579

548-
where $T$ denotes transposition and $X_{i,t}$ is an observation on variable $i$ at time $t$.
580+
where $T$ again denotes complex transposition and $X_{i,t}$ is an observation on variable $i$ at time $t$.
549581

550582
From $\tilde X$, form two matrices
551583

@@ -561,10 +593,12 @@ $$
561593

562594
Here $'$ does not denote matrix transposition but instead is part of the name of the matrix $X'$.
563595

564-
In forming $ X$ and $X'$, we have in each case dropped a column from $\tilde X$.
596+
In forming $ X$ and $X'$, we have in each case dropped a column from $\tilde X$, in the case of $X$ the last column, and in the case of $X'$ the first column.
565597

566598
Evidently, $ X$ and $ X'$ are both $m \times \tilde n$ matrices where $\tilde n = n - 1$.
567599

600+
We now let the rank of $X$ be $p \neq \min(m, \tilde n) = \tilde n$.
601+
568602
We start with a system consisting of $m$ least squares regressions of **everything** on one lagged value of **everything**:
569603

570604
$$
@@ -577,10 +611,12 @@ $$
577611
A = X' X^{+}
578612
$$
579613

580-
and where the (huge) $m \times m $ matrix $X^{+}$ is the Moore-Penrose generalized inverse of $X$.
614+
and where the (possibly huge) $m \times m $ matrix $X^{+}$ is the Moore-Penrose generalized inverse of $X$.
615+
616+
The $i$ the row of $A$ is an $m \times 1$ vector of regression coefficients of $X_{i,t+1}$ on $X_{j,t}, j = 1, \ldots, m$.
581617

582618

583-
Think about the singular value decomposition
619+
Think about the (reduced) singular value decomposition
584620

585621
$$
586622
X = U \Sigma V^T
@@ -591,8 +627,8 @@ where $U$ is $m \times p$, $\Sigma$ is a $p \times p$ diagonal matrix, and $ V^
591627
Here $p$ is the rank of $X$, where necessarily $p \leq \tilde n$.
592628

593629

594-
We could compute the generalized inverse $X^+$ by using
595-
as
630+
We could construct the generalized inverse $X^+$ of $X$ by using
631+
a singular value decomposition $X = U \Sigma V^T$ to compute
596632

597633
$$
598634
X^{+} = V \Sigma^{-1} U^T
@@ -604,12 +640,12 @@ The idea behind **dynamic mode decomposition** is to construct an approximation
604640

605641
* sidesteps computing the generalized inverse $X^{+}$
606642

607-
* constructs an $m \times r$ matrix $\Phi$ that captures effects on all $m$ variables of $r < < p$ dynamic modes that are associated with the $r$ largest singular values
643+
* constructs an $m \times r$ matrix $\Phi$ that captures effects on all $m$ variables of $r < < p$ **modes** that are associated with the $r$ largest singular values
608644

609645

610646
* uses $\Phi$ and powers of $r$ singular values to forecast *future* $X_t$'s
611647

612-
The magic of **dynamic mode decomposition** is that we accomplish this without ever computing the regression coefficients $A = X' X^{+}$.
648+
The beauty of **dynamic mode decomposition** is that we accomplish this without ever computing the regression coefficients $A = X' X^{+}$.
613649

614650
To construct a DMD, we deploy the following steps:
615651

@@ -624,7 +660,7 @@ To construct a DMD, we deploy the following steps:
624660
625661
But we won't do that.
626662
627-
We'll first compute the $r$ largest singular values of $X$.
663+
We'll compute the $r$ largest singular values of $X$.
628664
629665
We'll form matrices $\tilde V, \tilde U$ corresponding to those $r$ singular values.
630666
@@ -645,12 +681,14 @@ To construct a DMD, we deploy the following steps:
645681
\tilde X_{t+1} = \tilde A \tilde X_t
646682
$$
647683
648-
where an approximation to (i.e., a projection of) the original $m \times 1$ vector $X_t$ can be acquired from inverting
684+
where an approximation $\check X_t$ to (i.e., a projection of) the original $m \times 1$ vector $X_t$ can be acquired from
649685
650686
$$
651-
X_t = \tilde U \tilde X_t
687+
\check X_t = \tilde U \tilde X_t
652688
$$
653689
690+
We'll provide a formula for $\tilde X_t$ soon.
691+
654692
From equation {eq}`eq:tildeA_1` and {eq}`eq:bigAformula` it follows that
655693
656694
@@ -683,12 +721,9 @@ To construct a DMD, we deploy the following steps:
683721
We can construct an $r \times m$ matrix generalized inverse $\Phi^{+}$ of $\Phi$.
684722
685723
686-
* We interrupt the flow with a digression at this point
687-
724+
* It will be helpful below to notice that from formula {eq}`eq:Phiformula`, we have
688725
689-
* notice that from formula {eq}`eq:Phiformula`, we have
690-
691-
$$
726+
$$
692727
\begin{aligned}
693728
A \Phi & = (X' \tilde V \tilde \Sigma^{-1} \tilde U^T) (X' \tilde V \tilde \Sigma^{-1} W) \cr
694729
& = X' \tilde V \Sigma^{-1} \tilde A W \cr
@@ -702,17 +737,16 @@ To construct a DMD, we deploy the following steps:
702737

703738

704739

705-
* Define an initial vector $b$ of dominant modes by
740+
* Define an $ r \times 1$ initial vector $b$ of dominant modes by
706741

707742
$$
708743
b= \Phi^{+} X_1
709744
$$ (eq:bphieqn)
710745
711-
where evidently $b$ is an $r \times 1$ vector.
712-
746+
713747
(Since it involves smaller matrices, formula {eq}`eq:beqnsmall` below is a computationally more efficient way to compute $b$)
714748
715-
* Then define _projected data_ $\hat X_1$ by
749+
* Then define _projected data_ $\tilde X_1$ by
716750
717751
$$
718752
\tilde X_1 = \Phi b
@@ -777,6 +811,12 @@ $$
777811
\check X_{t+j} = \Phi \Lambda^j \Phi^{+} X_t
778812
$$
779813
814+
or
815+
816+
$$
817+
\check X_{t+j} = \Phi \Lambda^j (W \Lambda)^{-1} \tilde X_t
818+
$$
819+
780820
781821
782822

0 commit comments

Comments
 (0)