Merge pull request #202 from QuantEcon/improve-mc-eigen

jstac · web-flow · commit 8da0125e1a48 · 2023-06-04T15:06:58.000+10:00
Improve Markov Chain and Eigenvalues Lectures
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
@@ -35,6 +35,7 @@ parts:
   - file: olg
   - file: markov_chains_I
   - file: markov_chains_II
+  - file: eigen_II
   - file: commod_price
 - caption: Optimization
   numbered: true
@@ -46,7 +47,6 @@ parts:
 - caption: Modeling in Higher Dimensions
   numbered: true
   chapters:
-  - file: eigen_II
   - file: input_output
   - file: lake_model
   - file: asset_pricing
diff --git a/lectures/eigen_II.md b/lectures/eigen_II.md
@@ -11,8 +11,6 @@ kernelspec:
   name: python3
 ---
 
-
-
 # Spectral Theory
 
 ```{index} single: Spectral Theory
@@ -63,11 +61,11 @@ We denote this as $A \geq 0$.
 (irreducible)=
 ### Irreducible matrices
 
-We have (informally) introduced irreducible matrices in the [Markov chain lecture](markov_chains_II.md).
+We introduced irreducible matrices in the [Markov chain lecture](mc_irreducible).
 
-Here we will introduce this concept formally.
+Here we generalize this concept:
 
-$A$ is called **irreducible** if for *each* $(i,j)$ there is an integer $k \geq 0$ such that $a^{k}_{ij} > 0$.
+An $n \times n$ matrix $A$ is called irreducible if, for each $i,j$ with $1 \leq i, j \leq n$, there exists a $k \geq 0$ such that $a^{k}_{ij} > 0$.
 
 A matrix $A$ that is not irreducible is called reducible.
 
@@ -84,29 +82,25 @@ Here are some examples to illustrate this further.
 
 Let $A$ be a square nonnegative matrix and let $A^k$ be the $k^{th}$ power of $A$.
 
-A matrix is considered **primitive** if there exists a $k \in \mathbb{N}$ such that $A^k$ is everywhere positive.
+A matrix is called **primitive** if there exists a $k \in \mathbb{N}$ such that $A^k$ is everywhere positive.
 
 It means that $A$ is called primitive if there is an integer $k \geq 0$ such that $a^{k}_{ij} > 0$ for *all* $(i,j)$.
 
 We can see that if a matrix is primitive, then it implies the matrix is irreducible.
 
-This is because if there exists an $A^k$ such that $a^{k}_{ij} > 0$ for all $(i,j)$, then it guarantees the same property for ${k+1}^th, {k+2}^th ... {k+n}^th$ iterations.
-
-In other words, a primitive matrix is both irreducible and aperiodic as aperiodicity requires a state to be visited with a guarantee of returning to itself after a certain amount of iterations.
-
 ### Left eigenvectors
 
-We have previously discussed right (ordinary) eigenvectors $Av = \lambda v$.
+We previously discussed right (ordinary) eigenvectors $Av = \lambda v$.
 
 Here we introduce left eigenvectors.
 
 Left eigenvectors will play important roles in what follows, including that of stochastic steady states for dynamic models under a Markov assumption.
 
 We will talk more about this later, but for now, let's define left eigenvectors.
 
-A vector $w$ is called a left eigenvector of $A$ if $w$ is an eigenvector of $A^T$.
+A vector $w$ is called a left eigenvector of $A$ if $w$ is an eigenvector of $A^\top$.
 
-In other words, if $w$ is a left eigenvector of matrix A, then $A^T w = \lambda w$, where $\lambda$ is the eigenvalue associated with the left eigenvector $v$.
+In other words, if $w$ is a left eigenvector of matrix $A$, then $A^\top w = \lambda w$, where $\lambda$ is the eigenvalue associated with the left eigenvector $v$.
 
 This hints at how to compute left eigenvectors
 
@@ -147,17 +141,17 @@ print(w)
 
 Note that the eigenvalues for both left and right eigenvectors are the same, but the eigenvectors themselves are different.
 
-We can then take transpose to obtain $A^T w = \lambda w$ and obtain $w^T A= \lambda w^T$.
+We can then take transpose to obtain $A^\top w = \lambda w$ and obtain $w^\top A= \lambda w^\top$.
 
 This is a more common expression and where the name left eigenvectors originates.
 
 (perron-frobe)=
 ### The Perron-Frobenius Theorem
 
-For a nonnegative matrix $A$ the behavior of $A^k$ as $k \to \infty$ is controlled by the eigenvalue with the largest
+For a square nonnegative matrix $A$, the behavior of $A^k$ as $k \to \infty$ is controlled by the eigenvalue with the largest
 absolute value, often called the **dominant eigenvalue**.
 
-For a matrix nonnegative square matrix $A$, the Perron-Frobenius Theorem characterizes certain
+For any such matrix $A$, the Perron-Frobenius Theorem characterizes certain
 properties of the dominant eigenvalue and its corresponding eigenvector.
 
 ```{prf:Theorem} Perron-Frobenius Theorem
@@ -178,9 +172,7 @@ If $A$ is primitive then,
 
 6. the inequality $|\lambda| \leq r(A)$ is **strict** for all eigenvalues $\lambda$ of $A$ distinct from $r(A)$, and
 7. with $v$ and $w$ normalized so that the inner product of $w$ and  $v = 1$, we have
-$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$.
-\
-the matrix $v w^{\top}$ is called the **Perron projection** of $A$.
+$ r(A)^{-m} A^m$ converges to $v w^{\top}$ when $m \rightarrow \infty$. The matrix $v w^{\top}$ is called the **Perron projection** of $A$
 ```
 
 (This is a relatively simple version of the theorem --- for more details see
@@ -194,7 +186,7 @@ Now let's consider examples for each case.
 
 #### Example 1: irreducible matrix
 
-Consider the following irreducible matrix A:
+Consider the following irreducible matrix $A$:
 
 ```{code-cell} ipython3
 A = np.array([[0, 1, 0],
@@ -208,7 +200,7 @@ We can compute the dominant eigenvalue and the corresponding eigenvector
 eig(A)
 ```
 
-Now we can go through our checklist to verify the claims of the Perron-Frobenius Theorem for the irreducible matrix A:
+Now we can go through our checklist to verify the claims of the Perron-Frobenius Theorem for the irreducible matrix $A$:
 
 1. The dominant eigenvalue is real-valued and non-negative.
 2. All other eigenvalues have absolute values less than or equal to the dominant eigenvalue.
@@ -218,7 +210,7 @@ Now we can go through our checklist to verify the claims of the Perron-Frobenius
 
 #### Example 2: primitive matrix
 
-Consider the following primitive matrix B:
+Consider the following primitive matrix $B$:
 
 ```{code-cell} ipython3
 B = np.array([[0, 1, 1],
@@ -228,27 +220,13 @@ B = np.array([[0, 1, 1],
 np.linalg.matrix_power(B, 2)
 ```
 
-We can compute the dominant eigenvalue and the corresponding eigenvector using the power iteration method as discussed {ref}`earlier<eig1_ex1>`:
-
-```{code-cell} ipython3
-num_iters = 20
-b = np.random.rand(B.shape[1])
-
-for i in range(num_iters):
-    b = B @ b
-    b = b / np.linalg.norm(b)
-
-dominant_eigenvalue = np.dot(B @ b, b) / np.dot(b, b)
-np.round(dominant_eigenvalue, 2)
-```
+We compute the dominant eigenvalue and the corresponding eigenvector
 
 ```{code-cell} ipython3
 eig(B)
 ```
 
-
-
-Now let's verify the claims of the Perron-Frobenius Theorem for the primitive matrix B:
+Now let's verify the claims of the Perron-Frobenius Theorem for the primitive matrix $B$:
 
 1. The dominant eigenvalue is real-valued and non-negative.
 2. All other eigenvalues have absolute values strictly less than the dominant eigenvalue.
@@ -307,8 +285,8 @@ A1 = np.array([[1, 2],
                [1, 4]])
 
 A2 = np.array([[0, 1, 1],
-              [1, 0, 1],
-              [1, 1, 0]])
+               [1, 0, 1],
+               [1, 1, 0]])
 
 A3 = np.array([[0.971, 0.029, 0.1, 1],
                [0.145, 0.778, 0.077, 0.59],
@@ -353,7 +331,7 @@ In fact we have already seen the theorem in action before in {ref}`the markov ch
 
 We are now prepared to bridge the languages spoken in the two lectures.
 
-A primitive matrix is both irreducible (or strongly connected in the language of graph) and aperiodic.
+A primitive matrix is both irreducible (or strongly connected in the language of {ref}`graph theory<strongly_connected>` and aperiodic.
 
 So Perron-Frobenius Theorem explains why both Imam and Temple matrix and Hamilton matrix converge to a stationary distribution, which is the Perron projection of the two matrices
 
diff --git a/lectures/markov_chains_II.md b/lectures/markov_chains_II.md
@@ -61,6 +61,7 @@ from matplotlib import cm
 import matplotlib as mpl
 ```
 
+(mc_irreducible)=
 ## Irreducibility
 
 
@@ -141,8 +142,6 @@ mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))
 mc.is_irreducible
 ```
 
-
-
 It might be clear to you already that irreducibility is going to be important
 in terms of long-run outcomes.
 
@@ -246,34 +245,44 @@ Therefore, we can see the sample path averages for each state (the fraction of
 time spent in each state) converges to the stationary distribution regardless of
 the starting state
 
+Let's denote the fraction of time spent in state $x$ at time $t$ in our sample path as $\hat p_t(x)$ where
+
+$$
+\hat p_t(x) := \frac{1}{t} \sum_{t = 1}^t \mathbf{1}\{X_t = x\}
+$$
+
+
+Here we compare $\hat p_t(x)$ with the stationary distribution $\psi^* (x)$ for different starting points $x_0$.
+
 ```{code-cell} ipython3
 P = np.array([[0.971, 0.029, 0.000],
               [0.145, 0.778, 0.077],
               [0.000, 0.508, 0.492]])
 ts_length = 10_000
 mc = qe.MarkovChain(P)
 n = len(P)
-fig, axes = plt.subplots(nrows=1, ncols=n)
+fig, axes = plt.subplots(nrows=1, ncols=n, figsize=(15, 6))
 ψ_star = mc.stationary_distributions[0]
 plt.subplots_adjust(wspace=0.35)
 
 for i in range(n):
-    axes[i].grid()
-    axes[i].axhline(ψ_star[i], linestyle='dashed', lw=2, color = 'black', 
+    axes[i].axhline(ψ_star[i], linestyle='dashed', lw=2, color='black', 
                     label = fr'$\psi^*({i})$')
     axes[i].set_xlabel('t')
-    axes[i].set_ylabel(f'fraction of time spent at {i}')
+    axes[i].set_ylabel(fr'$\hat p_t({i})$')
 
     # Compute the fraction of time spent, starting from different x_0s
     for x0, col in ((0, 'blue'), (1, 'green'), (2, 'red')):
         # Generate time series that starts at different x0
         X = mc.simulate(ts_length, init=x0)
-        X_bar = (X == i).cumsum() / (1 + np.arange(ts_length, dtype=float))
-        axes[i].plot(X_bar, color=col, label=f'$x_0 = \, {x0} $')
+        p_hat = (X == i).cumsum() / (1 + np.arange(ts_length, dtype=float))
+        axes[i].plot(p_hat, color=col, label=f'$x_0 = \, {x0} $')
     axes[i].legend()
 plt.show()
 ```
 
+Note the convergence to the stationary distribution regardless of the starting point $x_0$.
+
 ### Example 3
 
 Let's look at one more example with six states {ref}`discussed before <mc_eg3>`.
@@ -295,8 +304,7 @@ $$
 The {ref}`graph <mc_eg3>` for the chain shows all states are reachable,
 indicating that this chain is irreducible.
 
-Similar to previous examples, the sample path averages for each state converge
-to the stationary distribution.
+Here we visualize the difference between $\hat p_t(x)$ and the stationary distribution $\psi^* (x)$ for each state $x$
 
 ```{code-cell} ipython3
 P = [[0.86, 0.11, 0.03, 0.00, 0.00, 0.00],
@@ -313,20 +321,23 @@ fig, ax = plt.subplots(figsize=(9, 6))
 X = mc.simulate(ts_length)
 # Center the plot at 0
 ax.set_ylim(-0.25, 0.25)
-ax.axhline(0, linestyle='dashed', lw=2, color = 'black', alpha=0.4)
+ax.axhline(0, linestyle='dashed', lw=2, color='black', alpha=0.4)
 
 
 for x0 in range(6):
     # Calculate the fraction of time for each state
-    X_bar = (X == x0).cumsum() / (1 + np.arange(ts_length, dtype=float))
-    ax.plot(X_bar - ψ_star[x0], label=f'$X = {x0+1} $')
+    p_hat = (X == x0).cumsum() / (1 + np.arange(ts_length, dtype=float))
+    ax.plot(p_hat - ψ_star[x0], label=f'$x = {x0+1} $')
     ax.set_xlabel('t')
-    ax.set_ylabel(r'fraction of time spent in a state $- \psi^* (x)$')
+    ax.set_ylabel(r'$\hat p_t(x) - \psi^* (x)$')
 
 ax.legend()
 plt.show()
 ```
 
+Similar to previous examples, the sample path averages for each state converge
+to the stationary distribution.
+
 ### Example 4
 
 Let's look at another example with two states: 0 and 1.
@@ -364,19 +375,18 @@ fig, axes = plt.subplots(nrows=1, ncols=n)
 ψ_star = mc.stationary_distributions[0]
 
 for i in range(n):
-    axes[i].grid()
     axes[i].set_ylim(0.45, 0.55)
-    axes[i].axhline(ψ_star[i], linestyle='dashed', lw=2, color = 'black', 
+    axes[i].axhline(ψ_star[i], linestyle='dashed', lw=2, color='black', 
                     label = fr'$\psi^*({i})$')
     axes[i].set_xlabel('t')
-    axes[i].set_ylabel(f'fraction of time spent at {i}')
+    axes[i].set_ylabel(fr'$\hat p_t({i})$')
 
     # Compute the fraction of time spent, for each x
     for x0 in range(n):
         # Generate time series starting at different x_0
         X = mc.simulate(ts_length, init=x0)
-        X_bar = (X == i).cumsum() / (1 + np.arange(ts_length, dtype=float))
-        axes[i].plot(X_bar, label=f'$x_0 = \, {x0} $')
+        p_hat = (X == i).cumsum() / (1 + np.arange(ts_length, dtype=float))
+        axes[i].plot(p_hat, label=f'$x_0 = \, {x0} $')
 
     axes[i].legend()
 plt.show()
@@ -388,8 +398,6 @@ The proportion of time spent in a state can converge to the stationary distribut
 
 However, the distribution at each state does not.
 
-
-
 ### Expectations of geometric sums
 
 Sometimes we want to compute the mathematical expectation of a geometric sum, such as
@@ -505,14 +513,14 @@ mc = qe.MarkovChain(P)
 fig, ax = plt.subplots(figsize=(9, 6))
 X = mc.simulate(ts_length)
 ax.set_ylim(-0.25, 0.25)
-ax.axhline(0, linestyle='dashed', lw=2, color = 'black', alpha=0.4)
+ax.axhline(0, linestyle='dashed', lw=2, color='black', alpha=0.4)
 
 for x0 in range(8):
     # Calculate the fraction of time for each worker
-    X_bar = (X == x0).cumsum() / (1 + np.arange(ts_length, dtype=float))
-    ax.plot(X_bar - ψ_star[x0], label=f'$X = {x0+1} $')
+    p_hat = (X == x0).cumsum() / (1 + np.arange(ts_length, dtype=float))
+    ax.plot(p_hat - ψ_star[x0], label=f'$x = {x0+1} $')
     ax.set_xlabel('t')
-    ax.set_ylabel(r'fraction of time spent in a state $- \psi^* (x)$')
+    ax.set_ylabel(r'$\hat p_t(x) - \psi^* (x)$')
 
 ax.legend()
 plt.show()
@@ -584,8 +592,7 @@ mc = qe.MarkovChain(P)
 
 fig, ax = plt.subplots(figsize=(9, 6))
 ax.set_ylim(-0.25, 0.25)
-ax.grid()
-ax.hlines(0, 0, ts_length, lw=2, alpha=0.6)   # Horizonal line at zero
+ax.axhline(0, linestyle='dashed', lw=2, color='black', alpha=0.4)
 
 for x0, col in ((0, 'blue'), (1, 'green')):
     # Generate time series for worker that starts at x0
@@ -594,10 +601,12 @@ for x0, col in ((0, 'blue'), (1, 'green')):
     X_bar = (X == 0).cumsum() / (1 + np.arange(ts_length, dtype=float))
     # Plot
     ax.fill_between(range(ts_length), np.zeros(ts_length), X_bar - p, color=col, alpha=0.1)
-    ax.plot(X_bar - p, color=col, label=f'$X_0 = \, {x0} $')
+    ax.plot(X_bar - p, color=col, label=f'$x_0 = \, {x0} $')
     # Overlay in black--make lines clearer
     ax.plot(X_bar - p, 'k-', alpha=0.6)
-
+    ax.set_xlabel('t')
+    ax.set_ylabel(r'$\bar X_m - \psi^* (x)$')
+    
 ax.legend(loc='upper right')
 plt.show()
 ```
diff --git a/lectures/networks.md b/lectures/networks.md
@@ -329,7 +329,7 @@ For example,
 ```{code-cell} ipython3
 G_p.in_degree('p')
 ```
-
+(strongly_connected)=
 ### Communication
 
 Next, we study communication and connectedness, which have important