FIX: Adjustments for main branch to compile (#237)

mmcky · web-flow · commit 00b0760a35e6 · 2022-06-28T15:22:10.000+10:00
* FIX: Add arviz to environment

* FIX: headers in bayes_nonconj

* FIX: equation references

* FIX: headers in prob_matrix

* FIX: add install for packages outside env

* FIX: Adjust prob_matrix for PDF

* fix equations in ar1_turningpoint

* FIX: header in ar1_bayes

* FIX: $ duplicate

* FIX: math

* fix math

* Adjust ar1_turningpts

* Remove offending TeX for pdflatex builds
diff --git a/environment.yml b/environment.yml
@@ -14,6 +14,7 @@ dependencies:
     - ghp-import==1.1.0
     - sphinxcontrib-youtube==1.1.0
     - sphinx-togglebutton==0.3.1
+    - arviz==0.12.1
     # Sandpit Requirements
     - quantecon
     - array-to-latex
diff --git a/lectures/ar1_bayes.md b/lectures/ar1_bayes.md
@@ -11,9 +11,7 @@ kernelspec:
   name: python3
 ---
 
-## Posterior Distributions for  AR(1) Parameters
-
-
+# Posterior Distributions for  AR(1) Parameters
 
 We'll begin with some Python imports.
 
@@ -174,7 +172,7 @@ Now we shall use  Bayes' law to construct a posterior distribution, conditioning
 
 First we'll use **pymc4**.
 
-### `PyMC` Implementation
+## `PyMC` Implementation
 
 For a  normal distribution in `pymc`, 
 $var = 1/\tau = \sigma^{2}$.
@@ -286,7 +284,7 @@ We'll return to this issue after we use `numpyro` to compute posteriors under ou
 
 We'll now repeat the calculations using  `numpyro`. 
 
-### `Numpyro` Implementation
+## `Numpyro` Implementation
 
 ```{code-cell} ipython3
 
diff --git a/lectures/ar1_turningpts.md b/lectures/ar1_turningpts.md
@@ -13,6 +13,12 @@ kernelspec:
 
 # Forecasting  an AR(1) process
 
+```{code-cell} ipython3
+:tags: [hide-output]
+
+!pip install arviz pymc
+```
+
 This lecture describes methods for forecasting statistics that are functions of future values of a univariate autogressive process.  
 
 The methods are designed to take into account two possible sources of uncertainty about these statistics:
@@ -25,7 +31,7 @@ We consider two sorts of statistics:
 
 - prospective values $y_{t+j}$ of a random process $\{y_t\}$ that is governed by the AR(1) process
 
-- sample path properties that are defined as non-linear functions of future values $\{y_{t+j}\}_{j\geq 1}$ at time $t$.
+- sample path properties that are defined as non-linear functions of future values $\{y_{t+j}\}_{j \geq 1}$ at time $t$.
 
 **Sample path properties** are things like "time to next turning point" or "time to next recession"
 
@@ -54,54 +60,48 @@ logger = logging.getLogger('pymc')
 logger.setLevel(logging.CRITICAL)
 ```
 
-## A Univariate First-Order  Autoregressive   Process
-
+## A Univariate First-Order Autoregressive Process
 
 Consider the univariate AR(1) model: 
 
 $$ 
 y_{t+1} = \rho y_t + \sigma \epsilon_{t+1}, \quad t \geq 0 
-$$ (eq1) 
+$$ (ar1-tp-eq1) 
 
 where the scalars $\rho$ and $\sigma$ satisfy $|\rho| < 1$ and $\sigma > 0$; 
 $\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $0$ and variance $1$. 
 
-The  initial condition $y_{0}$ is a known number. 
+The initial condition $y_{0}$ is a known number. 
 
-Equation {eq}`eq1` implies that for $t \geq 0$, the conditional density of $y_{t+1}$ is
+Equation {eq}`ar1-tp-eq1` implies that for $t \geq 0$, the conditional density of $y_{t+1}$ is
 
 $$
 f(y_{t+1} | y_{t}; \rho, \sigma) \sim {\mathcal N}(\rho y_{t}, \sigma^2) \
-$$ (eq2)
-
-
-Further, equation {eq}`eq1` also implies that for $t \geq 0$, the conditional density of $y_{t+j}$ for $j \geq 1$ is
+$$ (ar1-tp-eq2)
 
+Further, equation {eq}`ar1-tp-eq1` also implies that for $t \geq 0$, the conditional density of $y_{t+j}$ for $j \geq 1$ is
 
 $$
 f(y_{t+j} | y_{t}; \rho, \sigma) \sim {\mathcal N}\left(\rho^j y_{t}, \sigma^2 \frac{1 - \rho^{2j}}{1 - \rho^2} \right) 
-$$ (eq3)
+$$ (ar1-tp-eq3)
 
-
-The predictive distribution {eq}`eq3` that assumes that the parameters $\rho, \sigma$ are known, which we express
+The predictive distribution {eq}`ar1-tp-eq3` that assumes that the parameters $\rho, \sigma$ are known, which we express
 by conditioning on them.
 
 We also want to compute a  predictive distribution that does not condition on $\rho,\sigma$ but instead takes account of our uncertainty about them.
 
-We form this predictive distribution by integrating {eq}`eq3` with respect to a joint posterior distribution $\pi_t(\rho,\sigma | y^t )$ 
+We form this predictive distribution by integrating {eq}`ar1-tp-eq3` with respect to a joint posterior distribution $\pi_t(\rho,\sigma | y^t)$ 
 that conditions on an observed history $y^t = \{y_s\}_{s=0}^t$:
 
 $$ 
 f(y_{t+j} | y^t)  = \int f(y_{t+j} | y_{t}; \rho, \sigma) \pi_t(\rho,\sigma | y^t ) d \rho d \sigma
-$$ (eq4)
-
-
+$$ (ar1-tp-eq4)
 
-Predictive distribution {eq}`eq3` assumes that parameters $(\rho,\sigma)$ are known.
+Predictive distribution {eq}`ar1-tp-eq3` assumes that parameters $(\rho,\sigma)$ are known.
 
-Predictive distribution {eq}`eq4` assumes that parameters $(\rho,\sigma)$ are uncertain, but have known probability distribution $\pi_t(\rho,\sigma | y^t )$.
+Predictive distribution {eq}`ar1-tp-eq4` assumes that parameters $(\rho,\sigma)$ are uncertain, but have known probability distribution $\pi_t(\rho,\sigma | y^t )$.
 
-We also  want  to compute some  predictive distributions of "sample path statistics" that might  include, for example
+We also want to compute some  predictive distributions of "sample path statistics" that might  include, for example
 
 - the time until the next "recession",
 - the minimum value of $Y$ over the next 8 periods,
@@ -115,20 +115,16 @@ To accomplish that for situations in which we are uncertain about parameter valu
 - for each draw $n=0,1,...,N$, simulate a "future path" of length $T_1$ with parameters $\left(\rho_n,\sigma_n\right)$ and compute our three "sample path statistics";
 - finally, plot the desired statistics from the $N$ samples as an empirical distribution.
 
-    
-
 ## Implementation
 
 First, we'll simulate a  sample path from which to launch our forecasts.  
 
 In addition to plotting the sample path, under the assumption that the true parameter values are known,
 we'll plot $.9$ and $.95$ coverage intervals using conditional distribution
-{eq}`eq3` described above. 
+{eq}`ar1-tp-eq3` described above. 
 
 We'll also plot a bunch of samples of sequences of future values and watch where they fall relative to the coverage interval.  
 
-
-
 ```{code-cell} ipython3
 def AR1_simulate(rho, sigma, y0, T):
 
@@ -198,30 +194,30 @@ Wecker {cite}`wecker1979predicting` proposed using simulation techniques to char
 
 He called these functions "path properties" to contrast them with properties of single data points.
 
-He studied   two special  prospective path properties of a given series $\{y_t\}$. 
+He studied two special  prospective path properties of a given series $\{y_t\}$. 
 
-The first  was **time until the next turning point**
+The first was **time until the next turning point**
 
-   * he defined a **"turning point"** to be the date of the  second of two successive declines in  $y$. 
+* he defined a **"turning point"** to be the date of the  second of two successive declines in  $y$. 
 
-To examine this statistic,  let $Z$ be an indicator process
+To examine this statistic, let $Z$ be an indicator process
 
-$$
+<!-- $$
 Z_t(Y(\omega)) := \left\{ 
 \begin{array} {c}
 \ 1 & \text{if } Y_t(\omega)< Y_{t-1}(\omega)< Y_{t-2}(\omega) \geq Y_{t-3}(\omega) \\
 0 & \text{otherwise}
 \end{array} \right.
-$$
+$$ -->
 
 Then the random variable **time until the next turning point**  is defined as the following **stopping time** with respect to $Z$:
 
 $$
 W_t(\omega):= \inf \{ k\geq 1 \mid Z_{t+k}(\omega) = 1\}
 $$
 
-Wecker  {cite}`wecker1979predicting`  also studied   **the minimum value of $Y$ over the next 8 quarters**
-which can be defined   as the random variable 
+Wecker  {cite}`wecker1979predicting` also studied **the minimum value of $Y$ over the next 8 quarters**
+which can be defined as the random variable 
 
 $$ 
 M_t(\omega) := \min \{ Y_{t+1}(\omega); Y_{t+2}(\omega); \dots; Y_{t+8}(\omega)\}
@@ -230,36 +226,34 @@ $$
 It is interesting to study yet another possible concept of a **turning point**.
 
 Thus, let
-
+<!-- 
 $$
 T_t(Y(\omega)) := \left\{ 
 \begin{array}{c}
 \ 1 & \text{if } Y_{t-2}(\omega)> Y_{t-1}(\omega) > Y_{t}(\omega) \ \text{and } \ Y_{t}(\omega) < Y_{t+1}(\omega) < Y_{t+2}(\omega) \\
-\  -1 & \text{if } Y_{t-2}(\omega)< Y_{t-1}(\omega) < Y_{t}(\omega) \ \text{and } \ Y_{t}(\omega) > Y_{t+1}(\omega) > Y_{t+2}(\omega) \\
+\ -1 & \text{if } Y_{t-2}(\omega)< Y_{t-1}(\omega) < Y_{t}(\omega) \ \text{and } \ Y_{t}(\omega) > Y_{t+1}(\omega) > Y_{t+2}(\omega) \\
 0 & \text{otherwise}
 \end{array} \right.
-$$
+$$ -->
 
 Define a **positive turning point today or tomorrow** statistic as 
 
-$$
+<!-- $$
 P_t(\omega) := \left\{ 
 \begin{array}{c}
 \ 1 & \text{if } T_t(\omega)=1 \ \text{or} \ T_{t+1}(\omega)=1 \\
 0 & \text{otherwise}
 \end{array} \right.
-$$
+$$ -->
 
 This is designed to express the event
 
- - ``after one or two decrease(s), $Y$ will grow for two consecutive quarters'' 
-
+- ``after one or two decrease(s), $Y$ will grow for two consecutive quarters'' 
 
 Following {cite}`wecker1979predicting`, we can use simulations to calculate  probabilities of $P_t$ and $N_t$ for each period $t$. 
 
 ## A Wecker-Like Algorithm
 
-
 The procedure consists of the following steps: 
 
 * index a sample path by $\omega_i$ 
@@ -270,11 +264,9 @@ $$
 Y(\omega_i) = \left\{ Y_{t+1}(\omega_i), Y_{t+2}(\omega_i), \dots, Y_{t+N}(\omega_i)\right\}_{i=1}^I
 $$
 
-*  for each path $\omega_i$, compute the associated value of $W_t(\omega_i), W_{t+1}(\omega_i), \dots$
+* for each path $\omega_i$, compute the associated value of $W_t(\omega_i), W_{t+1}(\omega_i), \dots$
 
-*  consider the sets $
-\{W_t(\omega_i)\}^{T}_{i=1}, \ \{W_{t+1}(\omega_i)\}^{T}_{i=1}, \ \dots, \ \{W_{t+N}(\omega_i)\}^{T}_{i=1}
-$ as samples from the predictive distributions $f(W_{t+1} \mid \mathcal y_t, \dots)$, $f(W_{t+2} \mid y_t, y_{t-1}, \dots)$, $\dots$, $f(W_{t+N} \mid y_t, y_{t-1}, \dots)$.
+* consider the sets $\{W_t(\omega_i)\}^{T}_{i=1}, \ \{W_{t+1}(\omega_i)\}^{T}_{i=1}, \ \dots, \ \{W_{t+N}(\omega_i)\}^{T}_{i=1}$ as samples from the predictive distributions $f(W_{t+1} \mid \mathcal y_t, \dots)$, $f(W_{t+2} \mid y_t, y_{t-1}, \dots)$, $\dots$, $f(W_{t+N} \mid y_t, y_{t-1}, \dots)$.
 
 
 ## Using Simulations to Approximate a Posterior Distribution
@@ -283,7 +275,6 @@ The next code cells use `pymc` to compute the time $t$ posterior distribution of
 
 Note that in defining the likelihood function, we choose to condition on the initial value $y_0$.
 
-
 ```{code-cell} ipython3
 def draw_from_posterior(sample):
     """
@@ -328,7 +319,6 @@ The graphs on the left portray posterior marginal distributions.
 
 ## Calculating Sample Path Statistics
 
-
 Our next step is to prepare Python codeto compute our sample path statistics.
 
 ```{code-cell} ipython3
@@ -455,9 +445,9 @@ plt.show()
 ## Extended Wecker Method
 
 Now we apply we apply our  "extended" Wecker method based on  predictive densities of $y$ defined by
-{eq}`eq4` that acknowledge posterior uncertainty in the parameters $\rho, \sigma$.
+{eq}`ar1-tp-eq4` that acknowledge posterior uncertainty in the parameters $\rho, \sigma$.
 
-To approximate  the intergration on the right side of {eq}`eq4`, we  repeately draw parameters from the joint posterior distribution each time we simulate a sequence of future values from model {eq}`eq1`.
+To approximate  the intergration on the right side of {eq}`ar1-tp-eq4`, we  repeately draw parameters from the joint posterior distribution each time we simulate a sequence of future values from model {eq}`ar1-tp-eq1`.
 
 ```{code-cell} ipython3
 def plot_extended_Wecker(post_samples, initial_path, N, ax):
@@ -525,4 +515,3 @@ plot_extended_Wecker(post_samples, initial_path, 1000, ax)
 plt.legend()
 plt.show()
 ```
-
diff --git a/lectures/bayes_nonconj.md b/lectures/bayes_nonconj.md
@@ -888,7 +888,7 @@ SVI_num_steps = 5000
 true_theta = 0.8
 ```
 
-#### Beta Prior and Posteriors
+### Beta Prior and Posteriors:
 
 Let's compare outcomes when we use a Beta prior.
 
@@ -944,7 +944,7 @@ Here the MCMC approximation looks good.
 
 But the VI approximation doesn't look so good.
 
-  * even though we use the  beta distribution as our guide, the VI approximated posterior distributions do not closely resemble the posteriors that we had just computed analytically. 
+* even though we use the  beta distribution as our guide, the VI approximated posterior distributions do not closely resemble the posteriors that we had just computed analytically. 
 
 (Here, our initial parameter for Beta guide is (0.5, 0.5).)
 
@@ -960,8 +960,6 @@ BayesianInferencePlot(true_theta, num_list, BETA_numpyro).SVI_plot(guide_dist='b
 ```
 
 
-
-
 ## Non-conjugate Prior Distributions
 
 Having assured ourselves that our MCMC and VI methods can work well when we have  conjugate prior and so can also compute analytically, we 
diff --git a/lectures/lagrangian_lqdp.md b/lectures/lagrangian_lqdp.md
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md