adivekar-utexas · CraftyEngineer · Jun 24, 2025
diff --git a/...hapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd b/...hapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd
@@ -0,0 +1,129 @@
+## 📘 Random Vectors (Multivariate Random Variables)
+
+So far we have been considering multiple r.v.s $X_1, X_2, \dots, X_d$ all as separate r.v.s. However, it is sometimes more convenient to consider them as one algebraic unit.
+
+A **random vector** (or a **multivariate random variable**) is one such possible unit. It is a $d$-dimensional vector of random variables $X_1, X_2, \dots, X_d$, where each r.v. lives on the same sample space $S$ and probability function $p(s)$ (which acts on outcomes $s \in S$). As each r.v. $X_i$ maps from $S$ to a real number, a random vector is a mapping from the sample space $S$ to a real vector in $\mathbb{R}^d$:
+
+$$
+X: S \rightarrow \mathbb{R}^d \quad \text{where} \quad X(s) = \begin{bmatrix} X_1(s) \\ X_2(s) \\ \vdots \\ X_d(s) \end{bmatrix}
+$$
+
+Each experimental run produces a realization of $X$, i.e., a vector of realized values $x = (x_1, x_2, \dots, x_d)^T \in \mathbb{R}^d$.
+
+> Note: that in general, being arranged as a vector does not tell us anything about the r.v.s $X_1, \dots, X_d$ (except that they share a sample space $S$ and a probability measure function $p$). In particular, r.v.s $X_i$ might be independent of each other, conditionally independent, or anything in between.
+
+---
+
+The **joint distribution** $p_X: \mathbb{R}^d \rightarrow [0, 1]$ is a probability function which assigns a probability value to every possible realization $x \in \mathbb{R}^d$. We denote the **joint distribution** of multivariate r.v. $X$ as $p_X(X)$ (or $\mathbb{P}(X_1, \dots, X_d)$), and **CDF** as:
+
+$$
+F_X(x) = \mathbb{P}(X_1 \leq x_1, \dots, X_d \leq x_d) = \mathbb{P}(X \leq x)
+$$
+
+where $x = (x_1, \dots, x_d)^T$ is a realization of multivariate r.v. $X = (X_1, \dots, X_d)^T$.
+
+---
+
+- **For discrete multivariate r.v.** $X$: $\text{pmf: } p_X(X = x) = \mathbb{P}(X_1 = x_1, \dots, X_d = x_d)$
+- **For continuous multivariate r.v.** $X$: $\text{pdf: } f_X(x) = f_{X_1, \dots, X_d}(x_1, \dots, x_d) = \frac{\partial^d}{\partial x_1 \partial x_2 \dots \partial x_d} \left[ F_X(x_1, \dots, x_d) \right]$
+
+
+## ➔ Operations on Random Vectors
+
+Similar to how we can algebraically manipulate random variables, we can manipulate a random vector $X = (X_1, X_2, \dots, X_d)^T$ using the principles of linear algebra, as though they were regular vectors.
+
+> The following applies to both discrete & continuous random vectors.
+
+---
+
+### ★ Expected Value of a Random Vector
+
+If $X = (X_1, X_2)^T, \; X: S \rightarrow \mathbb{R}^d$ is a random vector, then:
+
+$$
+\mathbb{E}(X) = 
+\begin{bmatrix}
+\mathbb{E}(X_1) \\
+\mathbb{E}(X_2)
+\end{bmatrix}
+= \mu_X \in \mathbb{R}^d
+$$
+
+$\mu_X$ is the **expected value of the random vector $X$**.
+
+After undergoing this transformation, $\mathbb{E}(X)$ is now a vector of real numbers (scalars), rather than a random variable (i.e., it no longer has a distribution associated with it).
+
+If $X, Y, Z$ are $d$-dimensional random vectors, then:
+
+$$
+\mathbb{E}(X + Y + Z) = \mathbb{E}(X) + \mathbb{E}(Y) + \mathbb{E}(Z)
+$$
+
+---
+
+### ➔ Affine Transform of a Random Vector
+
+We can produce a new random vector $Z: S \rightarrow \mathbb{R}^k$ by passing a vector $X = (X_1, \dots, X_d)^T$ through an **affine transform** (meaning we multiply it by a (fixed) matrix $A \in \mathbb{R}^{k \times d}$ and add a (fixed) vector $b \in \mathbb{R}^k$):
+
+$$
+Z = AX + b
+$$
+
+\[
+\begin{aligned}
+&\underbrace{Z}_{k \times 1} = 
+\underbrace{A}_{k \times d} 
+\cdot 
+\underbrace{X}_{d \times 1} 
++ 
+\underbrace{b}_{k \times 1}
+\end{aligned}
+\]
+
+- $Z$: new random vector  
+- $A$, $b$: fixed matrix and vector  
+
+
+### ➔ Expected Value of an Affine Transform
+
+Since $Z$ is also a random vector, we can continue to manipulate it algebraically.
+
+We can also use the **linearity of expectation** to establish the expected value of $Z$:
+
+$$
+\mathbb{E}(Z) = \mathbb{E}(AX + b) = A \cdot \mathbb{E}(X) + b
+$$
+
+\[
+\begin{aligned}
+&\underbrace{\mathbb{E}(Z)}_{k \times 1} = 
+\underbrace{A}_{k \times d} 
+\cdot 
+\underbrace{\mathbb{E}(X)}_{d \times 1} 
++ 
+\underbrace{b}_{k \times 1}
+\end{aligned}
+\]
+
+> Here $A$ and $b$ are high-dimensional constants.
+
+---
+
+### ➔ Orthogonality
+
+Multivariate r.v.s $X: S \rightarrow \mathbb{R}^d$ and $Z: S \rightarrow \mathbb{R}^l$ are **orthogonal** if:
+
+$$
+\mathbb{E}(X^T Z) = 0
+$$
+
+where:
+
+$$
+X^T Z = X_1 Z_1 + X_2 Z_2 + \dots + X_d Z_d
+$$
+
+
+
+
+
diff --git a/...ers/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd b/...ers/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd
@@ -0,0 +1,264 @@
+## ➔ Joint Distributions of Random Vectors
+
+Recall that by definition, a random vector $X = (X_1, \dots, X_d)$ is a collection of r.v.s:
+
+$$
+X_i : S \rightarrow \mathbb{R}, \quad X: S \rightarrow \mathbb{R}^d
+$$
+
+each of which maps from the **same sample space** $S$ to the **real line**. They also use the same **probability measure** $p(s)$ which assigns probability values to each outcome $s \in S$.
+
+We can thus imagine that the random vector is a mapping from $S$ to $\mathbb{R}^d$, i.e.,
+
+$$
+X: S \rightarrow \mathbb{R}^d \quad \text{where} \quad X(s) = 
+\begin{bmatrix}
+X_1(s) \\
+X_2(s) \\
+\vdots \\
+X_d(s)
+\end{bmatrix} \in \mathbb{R}^d, \quad \forall s \in S
+$$
+
+i.e., same point in $\mathbb{R}^d$ space.
+
+---
+
+Now just like a random variable must have its own distribution, so must a **random vector**. Thus:
+
+$$
+x = (x_1, \dots, x_d)
+$$
+
+which are different realizations of $X = (X_1, \dots, X_d)$ are assigned different probability values by the function:
+
+$$
+p_X : \mathbb{R}^d \rightarrow [0,1]
+$$
+
+---
+
+### ➤ Just like a r.v., a random vector may be discrete or continuous:
+
+1. $X: S \rightarrow \mathbb{R}^d$ is **discrete** if there exists a finite countable set $R_X$ such that:
+
+$$
+p(X \in R_X) = 1 \quad \forall x \in \mathbb{R}^d
+$$
+
+This is effectively the range of discrete random vector $X$.
+
+2. $X: S \rightarrow \mathbb{R}^d$ is **continuous** if:
+
+$$
+p(X = x) = 0 \quad \forall x \in \mathbb{R}^d
+$$
+
+i.e., it has zero probability at any realized point $x \in \mathbb{R}^d$.
+
+---
+
+## ➔ Joint PMF, PDF, and CDF of Random Vectors
+
+Armed with the notion of a discrete & continuous random vector, we can define the **joint pmf**, **joint pdf**, and **joint cdf**. These are definitions seen before in other sections.
+
+---
+
+### (i) If random vector $X: S \rightarrow \mathbb{R}^d$ is discrete or continuous:
+
+The **joint CDF** of $X = (X_1, X_2, \dots, X_d)$ is:
+
+$$
+F_X(x) = F_{X_1, X_2, \dots, X_d}(x_1, x_2, \dots, x_d) = \mathbb{P}(X_1 \leq x_1, \dots, X_d \leq x_d)
+$$
+
+- Joint CDF is monotonically non-decreasing (i.e., for random vectors $a \leq b \Leftrightarrow a_i \leq b_i \ \forall i$, then $F_X(a) \leq F_X(b)$).
+- Joint CDF is non-negative and limit towards $+\infty$ is 1 (while $-\infty$ is 0):
+
+$$
+\lim_{x_1 \to \infty} \cdots \lim_{x_d \to \infty} F_X(x_1, \dots, x_d) = 1
+$$
+
+$$
+\lim_{x_1 \to -\infty} \cdots \lim_{x_d \to -\infty} F_X(x_1, \dots, x_d) = 0
+$$
+
+---
+
+### (ii) If random vector $X: S \rightarrow \mathbb{R}^d$ is **discrete**:
+
+The **joint pmf** of $X = (X_1, \dots, X_d)$ is:
+
+$$
+p_X(x_1, \dots, x_d) = \mathbb{P}(X_1 = x_1, \dots, X_d = x_d)
+$$
+
+- Joint pmf is non-negative:
+
+$$
+\sum_{x \in R_X} p_X(x) = 1
+$$
+
+- As with univariate case, the joint pmf is essentially a lookup table for probabilities:
+
+$$
+\text{For } C \subseteq \mathbb{R}^d,\quad \mathbb{P}(X \in C) = \sum_{x \in C \cap R_X} p_X(x)
+$$
+
+---
+
+### (iii) If random vector $X: S \rightarrow \mathbb{R}^d$ is **continuous**:
+
+The **joint pdf** of $X = (X_1, \dots, X_d)$ is:
+
+$$
+f_X(x_1, \dots, x_d) = \frac{\partial^d}{\partial x_1 \dots \partial x_d} F_X(x_1, \dots, x_d)
+$$
+
+- $f_X(x) = 0$ if the derivative does not exist.
+- Joint pdf is non-negative and integrates to 1:
+
+$$
+\int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} f_X(x) \, dx_1 \, dx_2 \dots dx_d = 1
+$$
+
+
+- As with the univariate case, for a continuous random vector, we get the probability of values across a subset $C \subseteq \mathbb{R}^d$ by integrating over that set:
+
+$$
+\mathbb{P}(X \in C) = \int_C f_X(x_1, \dots, x_d) \, dx_1 \, dx_2 \dots dx_d
+$$
+
+- e.g., if $C = \{ x \in \mathbb{R}^2 \mid 1 < x_1 < 2,\; 1 < x_2 < 2 \}$, then:
+
+$$
+\mathbb{P}(X \in C) = \int_1^2 \int_1^2 f_{X_1, X_2}(x_1, x_2) \, dx_1 \, dx_2
+$$
+
+---
+
+- As with univariate integration from $-\infty$ to $\infty$, this gives:
+
+$$
+\mathbb{E}(X) = \mathbb{E}_{X_1, \dots, X_d}(x) = \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} x \cdot f_X(x_1, \dots, x_d) \, dx_1 \dots dx_d
+$$
+
+
+---
+
+
+## ✯ Summary of Results for Distributions
+### ➔ Marginal Distributions of a Random Vector
+
+Earlier we saw the marginal pmf of a set of discrete r.v.s, which we obtained by *summing out* or marginalizing some subset of the r.v.s $X_1, \dots, X_d$.
+
+In the notation of random vectors, let $X = (X_1, \dots, X_d)^T$ be a random vector. We can use the (multivariate) probability distribution of $X$ (pmf/pdf, and CDF) to obtain univariate distributions of $X_i$, or more generally, a distribution on some subset of $X_1, \dots, X_d$. Such distributions are called **marginal distributions**.
+
+---
+
+### ➤ To obtain univariate marginal distributions:
+
+#### (1) If $X$ is a discrete or continuous random vector:
+
+The **marginal CDF** of $X_i$ is:
+
+$$
+F_{X_i}(x) = \lim_{x_1 \to \infty} \cdots \lim_{x_{i-1} \to \infty} \lim_{x_{i+1} \to \infty} \cdots \lim_{x_d \to \infty} F_X(x)
+$$
+
+where $F_X(x) = F_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the joint CDF of $X$ at $x = (x_1, \dots, x_d)^T$.
+
+---
+
+#### (2) If $X$ is a **discrete random vector**:
+
+The **marginal pmf** of $X_i$ is:
+
+$$
+p_{X_i}(x_i) = \sum_{x_1 \in R_1} \cdots \sum_{x_{i-1} \in R_{i-1}} \sum_{x_{i+1} \in R_{i+1}} \cdots \sum_{x_d \in R_d} p_X(x)
+$$
+
+where:
+- $p_X(x) = p_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the joint pmf of $X$
+- $R_i := \text{Range}(X_i)$
+
+---
+
+#### (3) If $X$ is a **continuous random vector**:
+
+The **marginal pdf** of $X_i$ is:
+
+$$
+f_{X_i}(x_i) = \int \cdots \int f_X(x) \, dx_1 \cdots dx_{i-1} \, dx_{i+1} \cdots dx_d
+$$
+
+where:
+- $f_X(x) = f_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the **joint pdf** of $X$ at $x = (x_1, \dots, x_d)^T$
+
+
+---
+
+## ➤ To obtain higher-order marginals (joint marginals):
+
+Suppose from our random vector $X = (X_1, \dots, X_d)$, we select a subset of $k$ r.v.s which we want to retain, and marginalize out the rest, leaving a distribution over the remaining $(d - k)$ r.v.s.
+
+WLOG, let these be the last $k$ r.v.s: $X_{d-k+1}, \dots, X_d$
+
+Let us split our vector $X$ into two partitions:
+
+$$
+X = \begin{bmatrix}
+X_A \\
+X_B
+\end{bmatrix}
+$$
+
+We want to retain r.v.s in the vector $X_A = (X_1, X_2, \dots, X_k)^T$ and marginalize out those in $X_B = (X_{k+1}, \dots, X_d)^T$.
+
+We do this in a similar way as we did with the univariate marginals, only with fewer limits/sums/integrals over the joint CDF/PMF/PDF.
+
+We will not reproduce the equations here since they are very similar to the ones on the previous page. Instead, we give one example:
+
+Let $d = 6$ and we want to marginalize to $k = 3$ r.v.s:
+
+Then:  
+$X = (X_1, X_2, X_3, X_4, X_5, X_6)^T$  
+$X_A = (X_1, X_2, X_3)^T$,  
+$X_B = (X_4, X_5, X_6)^T$
+
+---
+
+### (a) If $X$ is discrete or continuous:
+
+The **marginal CDF** of $X_A$ is:
+
+$$
+F_{X_A}(x_A) = F_{X_1, X_2, X_3}(x_1, x_2, x_3) = \lim_{x_4 \to \infty} \lim_{x_5 \to \infty} \lim_{x_6 \to \infty} F_X(x)
+$$
+
+---
+
+### (b) If $X$ is a **discrete** random vector:
+
+The **marginal PMF** of $X_A$ is:
+
+$$
+p_{X_A}(x_A) = p_{X_1, X_2, X_3}(x_1, x_2, x_3) = \sum_{x_4 \in R_4} \sum_{x_5 \in R_5} \sum_{x_6 \in R_6} p_X(x_1, x_2, x_3, x_4, x_5, x_6)
+$$
+
+---
+
+### (c) If $X$ is a **continuous** random vector:
+
+The **marginal PDF** of $X_A$ is:
+
+$$
+f_{X_A}(x_A) = f_{X_1, X_2, X_3}(x_1, x_2, x_3) = \int \int \int f_X(x_1, x_2, x_3, x_4, x_5, x_6) \, dx_4 \, dx_5 \, dx_6
+$$
+
+
+
+
+
+
+