From ef29b125fff71858c02720fae6d6aaedc49d1d72 Mon Sep 17 00:00:00 2001 From: CraftyEngineer Date: Tue, 24 Jun 2025 11:33:09 +0530 Subject: [PATCH] Basic Multivariate Probability Commit --- .../01 - Random Vectors.qmd | 129 +++++++++ .../02 - Joint Distribution.qmd | 264 ++++++++++++++++++ .../03 - Conditional Distribution.qmd | 142 ++++++++++ .../04 - Independence of Random Vectors.qmd | 217 ++++++++++++++ .../05 - Bayes Theorem.qmd | 93 ++++++ .../06 - Law of total probability.qmd | 99 +++++++ ...- Conditional Law of Total Probability.qmd | 55 ++++ 7 files changed, 999 insertions(+) create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/03 - Conditional Distribution.qmd create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/04 - Independence of Random Vectors.qmd create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/05 - Bayes Theorem.qmd create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/06 - Law of total probability.qmd create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/07 - Conditional Law of Total Probability.qmd diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd new file mode 100644 index 0000000..bd301fc --- /dev/null +++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd @@ -0,0 +1,129 @@ +## πŸ“˜ Random Vectors (Multivariate Random Variables) + +So far we have been considering multiple r.v.s $X_1, X_2, \dots, X_d$ all as separate r.v.s. However, it is sometimes more convenient to consider them as one algebraic unit. + +A **random vector** (or a **multivariate random variable**) is one such possible unit. It is a $d$-dimensional vector of random variables $X_1, X_2, \dots, X_d$, where each r.v. lives on the same sample space $S$ and probability function $p(s)$ (which acts on outcomes $s \in S$). As each r.v. $X_i$ maps from $S$ to a real number, a random vector is a mapping from the sample space $S$ to a real vector in $\mathbb{R}^d$: + +$$ +X: S \rightarrow \mathbb{R}^d \quad \text{where} \quad X(s) = \begin{bmatrix} X_1(s) \\ X_2(s) \\ \vdots \\ X_d(s) \end{bmatrix} +$$ + +Each experimental run produces a realization of $X$, i.e., a vector of realized values $x = (x_1, x_2, \dots, x_d)^T \in \mathbb{R}^d$. + +> Note: that in general, being arranged as a vector does not tell us anything about the r.v.s $X_1, \dots, X_d$ (except that they share a sample space $S$ and a probability measure function $p$). In particular, r.v.s $X_i$ might be independent of each other, conditionally independent, or anything in between. + +--- + +The **joint distribution** $p_X: \mathbb{R}^d \rightarrow [0, 1]$ is a probability function which assigns a probability value to every possible realization $x \in \mathbb{R}^d$. We denote the **joint distribution** of multivariate r.v. $X$ as $p_X(X)$ (or $\mathbb{P}(X_1, \dots, X_d)$), and **CDF** as: + +$$ +F_X(x) = \mathbb{P}(X_1 \leq x_1, \dots, X_d \leq x_d) = \mathbb{P}(X \leq x) +$$ + +where $x = (x_1, \dots, x_d)^T$ is a realization of multivariate r.v. $X = (X_1, \dots, X_d)^T$. + +--- + +- **For discrete multivariate r.v.** $X$: $\text{pmf: } p_X(X = x) = \mathbb{P}(X_1 = x_1, \dots, X_d = x_d)$ +- **For continuous multivariate r.v.** $X$: $\text{pdf: } f_X(x) = f_{X_1, \dots, X_d}(x_1, \dots, x_d) = \frac{\partial^d}{\partial x_1 \partial x_2 \dots \partial x_d} \left[ F_X(x_1, \dots, x_d) \right]$ + + +## βž” Operations on Random Vectors + +Similar to how we can algebraically manipulate random variables, we can manipulate a random vector $X = (X_1, X_2, \dots, X_d)^T$ using the principles of linear algebra, as though they were regular vectors. + +> The following applies to both discrete & continuous random vectors. + +--- + +### β˜… Expected Value of a Random Vector + +If $X = (X_1, X_2)^T, \; X: S \rightarrow \mathbb{R}^d$ is a random vector, then: + +$$ +\mathbb{E}(X) = +\begin{bmatrix} +\mathbb{E}(X_1) \\ +\mathbb{E}(X_2) +\end{bmatrix} += \mu_X \in \mathbb{R}^d +$$ + +$\mu_X$ is the **expected value of the random vector $X$**. + +After undergoing this transformation, $\mathbb{E}(X)$ is now a vector of real numbers (scalars), rather than a random variable (i.e., it no longer has a distribution associated with it). + +If $X, Y, Z$ are $d$-dimensional random vectors, then: + +$$ +\mathbb{E}(X + Y + Z) = \mathbb{E}(X) + \mathbb{E}(Y) + \mathbb{E}(Z) +$$ + +--- + +### βž” Affine Transform of a Random Vector + +We can produce a new random vector $Z: S \rightarrow \mathbb{R}^k$ by passing a vector $X = (X_1, \dots, X_d)^T$ through an **affine transform** (meaning we multiply it by a (fixed) matrix $A \in \mathbb{R}^{k \times d}$ and add a (fixed) vector $b \in \mathbb{R}^k$): + +$$ +Z = AX + b +$$ + +\[ +\begin{aligned} +&\underbrace{Z}_{k \times 1} = +\underbrace{A}_{k \times d} +\cdot +\underbrace{X}_{d \times 1} ++ +\underbrace{b}_{k \times 1} +\end{aligned} +\] + +- $Z$: new random vector +- $A$, $b$: fixed matrix and vector + + +### βž” Expected Value of an Affine Transform + +Since $Z$ is also a random vector, we can continue to manipulate it algebraically. + +We can also use the **linearity of expectation** to establish the expected value of $Z$: + +$$ +\mathbb{E}(Z) = \mathbb{E}(AX + b) = A \cdot \mathbb{E}(X) + b +$$ + +\[ +\begin{aligned} +&\underbrace{\mathbb{E}(Z)}_{k \times 1} = +\underbrace{A}_{k \times d} +\cdot +\underbrace{\mathbb{E}(X)}_{d \times 1} ++ +\underbrace{b}_{k \times 1} +\end{aligned} +\] + +> Here $A$ and $b$ are high-dimensional constants. + +--- + +### βž” Orthogonality + +Multivariate r.v.s $X: S \rightarrow \mathbb{R}^d$ and $Z: S \rightarrow \mathbb{R}^l$ are **orthogonal** if: + +$$ +\mathbb{E}(X^T Z) = 0 +$$ + +where: + +$$ +X^T Z = X_1 Z_1 + X_2 Z_2 + \dots + X_d Z_d +$$ + + + + + diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd new file mode 100644 index 0000000..3e7a22f --- /dev/null +++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd @@ -0,0 +1,264 @@ +## βž” Joint Distributions of Random Vectors + +Recall that by definition, a random vector $X = (X_1, \dots, X_d)$ is a collection of r.v.s: + +$$ +X_i : S \rightarrow \mathbb{R}, \quad X: S \rightarrow \mathbb{R}^d +$$ + +each of which maps from the **same sample space** $S$ to the **real line**. They also use the same **probability measure** $p(s)$ which assigns probability values to each outcome $s \in S$. + +We can thus imagine that the random vector is a mapping from $S$ to $\mathbb{R}^d$, i.e., + +$$ +X: S \rightarrow \mathbb{R}^d \quad \text{where} \quad X(s) = +\begin{bmatrix} +X_1(s) \\ +X_2(s) \\ +\vdots \\ +X_d(s) +\end{bmatrix} \in \mathbb{R}^d, \quad \forall s \in S +$$ + +i.e., same point in $\mathbb{R}^d$ space. + +--- + +Now just like a random variable must have its own distribution, so must a **random vector**. Thus: + +$$ +x = (x_1, \dots, x_d) +$$ + +which are different realizations of $X = (X_1, \dots, X_d)$ are assigned different probability values by the function: + +$$ +p_X : \mathbb{R}^d \rightarrow [0,1] +$$ + +--- + +### ➀ Just like a r.v., a random vector may be discrete or continuous: + +1. $X: S \rightarrow \mathbb{R}^d$ is **discrete** if there exists a finite countable set $R_X$ such that: + +$$ +p(X \in R_X) = 1 \quad \forall x \in \mathbb{R}^d +$$ + +This is effectively the range of discrete random vector $X$. + +2. $X: S \rightarrow \mathbb{R}^d$ is **continuous** if: + +$$ +p(X = x) = 0 \quad \forall x \in \mathbb{R}^d +$$ + +i.e., it has zero probability at any realized point $x \in \mathbb{R}^d$. + +--- + +## βž” Joint PMF, PDF, and CDF of Random Vectors + +Armed with the notion of a discrete & continuous random vector, we can define the **joint pmf**, **joint pdf**, and **joint cdf**. These are definitions seen before in other sections. + +--- + +### (i) If random vector $X: S \rightarrow \mathbb{R}^d$ is discrete or continuous: + +The **joint CDF** of $X = (X_1, X_2, \dots, X_d)$ is: + +$$ +F_X(x) = F_{X_1, X_2, \dots, X_d}(x_1, x_2, \dots, x_d) = \mathbb{P}(X_1 \leq x_1, \dots, X_d \leq x_d) +$$ + +- Joint CDF is monotonically non-decreasing (i.e., for random vectors $a \leq b \Leftrightarrow a_i \leq b_i \ \forall i$, then $F_X(a) \leq F_X(b)$). +- Joint CDF is non-negative and limit towards $+\infty$ is 1 (while $-\infty$ is 0): + +$$ +\lim_{x_1 \to \infty} \cdots \lim_{x_d \to \infty} F_X(x_1, \dots, x_d) = 1 +$$ + +$$ +\lim_{x_1 \to -\infty} \cdots \lim_{x_d \to -\infty} F_X(x_1, \dots, x_d) = 0 +$$ + +--- + +### (ii) If random vector $X: S \rightarrow \mathbb{R}^d$ is **discrete**: + +The **joint pmf** of $X = (X_1, \dots, X_d)$ is: + +$$ +p_X(x_1, \dots, x_d) = \mathbb{P}(X_1 = x_1, \dots, X_d = x_d) +$$ + +- Joint pmf is non-negative: + +$$ +\sum_{x \in R_X} p_X(x) = 1 +$$ + +- As with univariate case, the joint pmf is essentially a lookup table for probabilities: + +$$ +\text{For } C \subseteq \mathbb{R}^d,\quad \mathbb{P}(X \in C) = \sum_{x \in C \cap R_X} p_X(x) +$$ + +--- + +### (iii) If random vector $X: S \rightarrow \mathbb{R}^d$ is **continuous**: + +The **joint pdf** of $X = (X_1, \dots, X_d)$ is: + +$$ +f_X(x_1, \dots, x_d) = \frac{\partial^d}{\partial x_1 \dots \partial x_d} F_X(x_1, \dots, x_d) +$$ + +- $f_X(x) = 0$ if the derivative does not exist. +- Joint pdf is non-negative and integrates to 1: + +$$ +\int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} f_X(x) \, dx_1 \, dx_2 \dots dx_d = 1 +$$ + + +- As with the univariate case, for a continuous random vector, we get the probability of values across a subset $C \subseteq \mathbb{R}^d$ by integrating over that set: + +$$ +\mathbb{P}(X \in C) = \int_C f_X(x_1, \dots, x_d) \, dx_1 \, dx_2 \dots dx_d +$$ + +- e.g., if $C = \{ x \in \mathbb{R}^2 \mid 1 < x_1 < 2,\; 1 < x_2 < 2 \}$, then: + +$$ +\mathbb{P}(X \in C) = \int_1^2 \int_1^2 f_{X_1, X_2}(x_1, x_2) \, dx_1 \, dx_2 +$$ + +--- + +- As with univariate integration from $-\infty$ to $\infty$, this gives: + +$$ +\mathbb{E}(X) = \mathbb{E}_{X_1, \dots, X_d}(x) = \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} x \cdot f_X(x_1, \dots, x_d) \, dx_1 \dots dx_d +$$ + + +--- + + +## ✯ Summary of Results for Distributions +### βž” Marginal Distributions of a Random Vector + +Earlier we saw the marginal pmf of a set of discrete r.v.s, which we obtained by *summing out* or marginalizing some subset of the r.v.s $X_1, \dots, X_d$. + +In the notation of random vectors, let $X = (X_1, \dots, X_d)^T$ be a random vector. We can use the (multivariate) probability distribution of $X$ (pmf/pdf, and CDF) to obtain univariate distributions of $X_i$, or more generally, a distribution on some subset of $X_1, \dots, X_d$. Such distributions are called **marginal distributions**. + +--- + +### ➀ To obtain univariate marginal distributions: + +#### (1) If $X$ is a discrete or continuous random vector: + +The **marginal CDF** of $X_i$ is: + +$$ +F_{X_i}(x) = \lim_{x_1 \to \infty} \cdots \lim_{x_{i-1} \to \infty} \lim_{x_{i+1} \to \infty} \cdots \lim_{x_d \to \infty} F_X(x) +$$ + +where $F_X(x) = F_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the joint CDF of $X$ at $x = (x_1, \dots, x_d)^T$. + +--- + +#### (2) If $X$ is a **discrete random vector**: + +The **marginal pmf** of $X_i$ is: + +$$ +p_{X_i}(x_i) = \sum_{x_1 \in R_1} \cdots \sum_{x_{i-1} \in R_{i-1}} \sum_{x_{i+1} \in R_{i+1}} \cdots \sum_{x_d \in R_d} p_X(x) +$$ + +where: +- $p_X(x) = p_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the joint pmf of $X$ +- $R_i := \text{Range}(X_i)$ + +--- + +#### (3) If $X$ is a **continuous random vector**: + +The **marginal pdf** of $X_i$ is: + +$$ +f_{X_i}(x_i) = \int \cdots \int f_X(x) \, dx_1 \cdots dx_{i-1} \, dx_{i+1} \cdots dx_d +$$ + +where: +- $f_X(x) = f_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the **joint pdf** of $X$ at $x = (x_1, \dots, x_d)^T$ + + +--- + +## ➀ To obtain higher-order marginals (joint marginals): + +Suppose from our random vector $X = (X_1, \dots, X_d)$, we select a subset of $k$ r.v.s which we want to retain, and marginalize out the rest, leaving a distribution over the remaining $(d - k)$ r.v.s. + +WLOG, let these be the last $k$ r.v.s: $X_{d-k+1}, \dots, X_d$ + +Let us split our vector $X$ into two partitions: + +$$ +X = \begin{bmatrix} +X_A \\ +X_B +\end{bmatrix} +$$ + +We want to retain r.v.s in the vector $X_A = (X_1, X_2, \dots, X_k)^T$ and marginalize out those in $X_B = (X_{k+1}, \dots, X_d)^T$. + +We do this in a similar way as we did with the univariate marginals, only with fewer limits/sums/integrals over the joint CDF/PMF/PDF. + +We will not reproduce the equations here since they are very similar to the ones on the previous page. Instead, we give one example: + +Let $d = 6$ and we want to marginalize to $k = 3$ r.v.s: + +Then: +$X = (X_1, X_2, X_3, X_4, X_5, X_6)^T$ +$X_A = (X_1, X_2, X_3)^T$, +$X_B = (X_4, X_5, X_6)^T$ + +--- + +### (a) If $X$ is discrete or continuous: + +The **marginal CDF** of $X_A$ is: + +$$ +F_{X_A}(x_A) = F_{X_1, X_2, X_3}(x_1, x_2, x_3) = \lim_{x_4 \to \infty} \lim_{x_5 \to \infty} \lim_{x_6 \to \infty} F_X(x) +$$ + +--- + +### (b) If $X$ is a **discrete** random vector: + +The **marginal PMF** of $X_A$ is: + +$$ +p_{X_A}(x_A) = p_{X_1, X_2, X_3}(x_1, x_2, x_3) = \sum_{x_4 \in R_4} \sum_{x_5 \in R_5} \sum_{x_6 \in R_6} p_X(x_1, x_2, x_3, x_4, x_5, x_6) +$$ + +--- + +### (c) If $X$ is a **continuous** random vector: + +The **marginal PDF** of $X_A$ is: + +$$ +f_{X_A}(x_A) = f_{X_1, X_2, X_3}(x_1, x_2, x_3) = \int \int \int f_X(x_1, x_2, x_3, x_4, x_5, x_6) \, dx_4 \, dx_5 \, dx_6 +$$ + + + + + + + diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/03 - Conditional Distribution.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/03 - Conditional Distribution.qmd new file mode 100644 index 0000000..d40052f --- /dev/null +++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/03 - Conditional Distribution.qmd @@ -0,0 +1,142 @@ +## ➀ Conditional Distributions of Random Vectors + +Similar to the marginal distribution, we can obtain the **conditional CDF/PMF/PDF** from the joint distribution of a random vector $X = (X_1, \dots, X_d)^T$. + +(We will not mimic the case of univariate conditioning since it is not frequently used.) + +Instead, suppose we partition $X \Rightarrow (X_A \mid X_B)^T$ where: + +- $X_A : S \rightarrow \mathbb{R}^k$ +- $X_B : S \rightarrow \mathbb{R}^{d-k}$ + +Then, we can **condition $X_A$ on a realization of** the random vector $X_B$, which we will denote $x_B \in \mathbb{R}^{d-k}$. + +Consequently, to obtain the **conditional distribution** of $X_A$ conditioned on $X_B = x_B$ (at a realized value $x_B$): + +--- + +### (a) $X_A$ and $X_B$ are **discrete** random vectors (Conditional PMF): + +The **conditional PMF of $X_A$** is: + +$$ +p_{X_A \mid X_B}(x_A \mid x_B) = \frac{p_X(x_A, x_B)}{p_{X_B}(x_B)} +$$ + +or in expanded form: + +$$ += \frac{p_{X_1, \dots, X_d}(x_1, \dots, x_k, x_{k+1}, \dots, x_d)}{p_{X_{k+1}, \dots, X_d}(x_{k+1}, \dots, x_d)} +$$ + +where: +- $p_X(x)$ is the joint pmf of $X$ at $x = (x_1, \dots, x_d)^T$ +- $p_{X_B}(x_B)$ is the marginal pmf of $X_B$ (i.e., we have summed out $X_A$ in the marginalization) + +--- + +### (b) $X_A$ and $X_B$ are **continuous** random vectors (Conditional PDF): + +Here we have an issue: the **PDF of $X_B$**, i.e., $f_{X_B}$, is zero at any realized value $x_B$. That is: + +$$ +\mathbb{P}(X_B = x_B) = 0 \quad \Rightarrow \quad f_{X_B}(x_B) = 0 +$$ + +And thus, our numerator and denominator are both zero if we try to emulate the steps we did to calculate the conditional pmf. + +### ➀ Conditional PDF of Random Vectors (continued) + +Instead, we perform a trick similar to what we do to calculate the conditional PDF in the **univariate** case. Instead of conditioning on the event $X_B = x_B$ (i.e., the conjunction $X_{d-k+1} = x_{d-k+1}, \dots, X_d = x_d$), we condition on the event that each r.v. $X_{B_i}$ lies in a small interval $\epsilon_i > 0$ from the corresponding realized value $x_{B_i}$. That is: + +$$ +x_{d-k+1} - \epsilon < X_{d-k+1} < x_{d-k+1} + \epsilon, \dots, X_d \in (x_d - \epsilon, x_d + \epsilon) +$$ + +Such events will generally have non-zero probability and hence can be conditioned on. We can then take the limit of each $\epsilon_i \to 0$ and use L'HΓ΄pital's rule. + +--- + +### ⭐ Conditional PDF of $X_A$ conditioned on $X_B = x_B$: + +$$ +f_{X_A \mid X_B = x_B}(x_A) = \frac{f_X(x)}{f_{X_B}(x_B)} +$$ + +That is, + +$$ +f_{X_1, \dots, X_k \mid X_{k+1}, \dots, X_d}(x_1, \dots, x_k \mid x_{k+1}, \dots, x_d) += \frac{f_{X_1, \dots, X_d}(x_1, \dots, x_k, x_{k+1}, \dots, x_d)}{f_{X_{k+1}, \dots, X_d}(x_{k+1}, \dots, x_d)} +$$ + +- where $f_X(x)$ is the joint PDF of $X$ at $x = (x_1, \dots, x_d)^T$ +- and $f_{X_B}(x_B)$ is the marginal PDF of $X_B$ at $x_B = (x_{k+1}, \dots, x_d)^T$ + +> Note that while the probabilities $\mathbb{P}(X = x)$ and $\mathbb{P}(X_B = x_B)$ are both zero, the values of $f_X(x)$ and $f_{X_B}(x_B)$ are both greater than zero (as they are values of a joint pdf). + +--- + +### πŸ“Œ Note: We also define the conditional CDF as: + +$$ +F_{X_A \mid X_B = x_B}(x_A) = \mathbb{P}(X_A \leq x_A \mid X_B = x_B) +$$ + +and + +$$ +f_{X_A \mid X_B = x_B}(x_A) = \frac{\partial^k}{\partial x_1 \cdots \partial x_k} F_{X_A \mid X_B = x_B}(x_A \mid x_B) +$$ + + + +> This quantity is different depending on the nature of $X$. + +--- + +### (i) $X_A$ and $X_B$ are **discrete** random vectors: + +$$ +F_{X_A \mid X_B = x_B}(x_A) = \mathbb{P}(X_A \leq x_A \mid X_B = x_B) = \frac{\mathbb{P}(X_A \leq x_A,\; X_B = x_B)}{\mathbb{P}(X_B = x_B)} +$$ + +- (Probability calculated from **joint pmf**) + +That is: + +$$ += \frac{p(X_1 \leq x_1,\; \dots,\; X_k \leq x_k,\; X_{k+1} = x_{k+1},\; \dots,\; X_d = x_d)}{p(X_{k+1} = x_{k+1},\; \dots,\; X_d = x_d)} +$$ + +- (Probability calculated from **marginal pmf** of $X_B$ by marginalizing out $X_A$) + +--- + +### (ii) $X_A$, $X_B$ are **continuous** random vectors: + +$$ +F_{X_A \mid X_B = x_B}(x_A) = \frac{\int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_k} f_X(x_A, x_B) \, dx_1 \dots dx_k}{f_{X_B}(x_B)} +$$ + +- (Probability calculated from **joint pdf**) + +- Denominator is the value of **marginal pdf** of $X_B$, not a probability. + +--- + +### πŸ“ Note: + +Here, $X_B = x_B$ is an event. +We can also condition on a **different event involving $X_B$**. + +$X_A \leq x_A$ is also an event β€” meaning: + +$$ +\{X_1 \leq x_1,\; X_2 \leq x_2,\; \dots,\; X_{d-k} \leq x_{d-k} \} +$$ + +--- + + + diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/04 - Independence of Random Vectors.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/04 - Independence of Random Vectors.qmd new file mode 100644 index 0000000..69a3d69 --- /dev/null +++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/04 - Independence of Random Vectors.qmd @@ -0,0 +1,217 @@ +## ➀ Independence of Random Vectors + +We stated the independence of multiple r.v.s on earlier pages. +Let us now do the same using a random vector, which has the **notion of independence**: + +--- + +### 1. Independence of Random Sub-vectors + +Let $X = (X_1, \dots, X_d)^T$ be a random vector. Partition it as: + +$$ +X \rightarrow \begin{bmatrix} X_A \\ X_B \end{bmatrix}, \quad \text{where } X_A = (X_1, \dots, X_{d-k})^T,\; X_B = (X_{d-k+1}, \dots, X_d)^T +$$ + +Then, **random vectors $X_A$ and $X_B$ are independent** iff their **joint distribution equals the product of their marginal distributions**, i.e.: + +$$ +p(X = x) = p_{X_A}(X_A = x_A) \cdot p_{X_B}(X_B = x_B) +$$ + +for all $x \in \mathbb{R}^d$, where $x = \begin{bmatrix} x_A \\ x_B \end{bmatrix}$. + +We denote this as: + +$$ +X_A \perp\!\!\!\perp X_B +$$ + +--- + +### (a) If $X$ is a **discrete or continuous** random vector: + +Random vectors $X_A$ and $X_B$ are independent if their **joint CDF** equals the product of the marginal CDFs of $X_A$ and $X_B$, i.e., + +$$ +F_X(x) = F_{X_1, \dots, X_{d-k}}(x_1, \dots, x_{d-k}) \cdot F_{X_{d-k+1}, \dots, X_d}(x_{d-k+1}, \dots, x_d) +$$ + +for all vectors $x \in \mathbb{R}^d$, where $x = \begin{bmatrix} x_A \\ x_B \end{bmatrix}$. + +--- + +### (b) If $X$ is a **discrete random vector**: + +Random vectors $X_A$ and $X_B$ are independent if their **joint PMF** equals the product of their marginal PMFs: + +$$ +p_X(x) = p_{X_A}(X_A = x_A) \cdot p_{X_B}(X_B = x_B) +$$ + +i.e., + +$$ +p_{X}(X = x) = p_{X_A}(X_A = x_A) \cdot p_{X_B}(X_B = x_B) +$$ + +for all $x \in \mathbb{R}^d$, with $x = \begin{bmatrix} x_A \\ x_B \end{bmatrix}$. + +--- + +### πŸ” Equivalent Condition: + +$X_A \perp\!\!\!\perp X_B$ if the **conditional PMF** of $X_A$ given $X_B$ is equal to the marginal of $X_A$, and vice versa: + +$$ +p_{X_A \mid X_B = x_B}(X_A = x_A) = p_{X_A}(X_A = x_A) \quad \forall x \in \mathbb{R}^d +$$ + +and + +$$ +p_{X_B \mid X_A = x_A}(X_B = x_B) = p_{X_B}(X_B = x_B) +$$ + +### (c) If $X$ is a **continuous random vector**: + +Random vectors $X_A$ and $X_B$ are independent if their **joint pdf** equals the product of their marginal pdfs: + +$$ +f_X(x) = f_{X_A}(x_A) \cdot f_{X_B}(x_B) +$$ + +i.e., + +$$ +f_X(x_1, \dots, x_d) = f_{X_1, \dots, X_{d-k}}(x_1, \dots, x_{d-k}) \cdot f_{X_{d-k+1}, \dots, X_d}(x_{d-k+1}, \dots, x_d) +$$ + +for all $x \in \mathbb{R}^d$, where $x = \begin{bmatrix} x_A \\ x_B \end{bmatrix}$. + +--- + +### πŸ” Equivalently: + +$X_A \perp\!\!\!\perp X_B$ if the **conditional pdf** of $X_A$ (conditioned on $X_B$) is equal to the **marginal pdf** of $X_A$, and vice versa: + +$$ +f_{X_A \mid X_B = x_B}(X_A = x_A) = f_{X_A}(X_A = x_A) \quad \forall x \in \mathbb{R}^d +$$ + +and + +$$ +f_{X_B \mid X_A = x_A}(X_B = x_B) = f_{X_B}(X_B = x_B) +$$ + +--- + +### 🧠 Conceptual Clarification: + +The independence of random vectors, as we saw here, is a bit different from independence of random variables. + +Here, it means that **every subset of r.v.s from $X_A$ is independent of every subset of r.v.s from $X_B$**. + +However, the r.v.s *within* $X_A$ might **not be independent of each other** (and same for $X_B$). +If they are, they are said to be **mutually independent**. + + +## ✦ Mutual Independence (of r.v.s) in a Random Vector + +If $X = (X_1, \dots, X_d)^T$ is a random vector, then each of the variables $X_1, \dots, X_d$ is **mutually independent** if each $X_i$ is independent of all subsets of $\{X_1, \dots, X_{i-1}, X_{i+1}, \dots, X_d\}$. + +--- + +### (a) If $X$ is a **continuous or discrete** random vector: + +$X_1, \dots, X_d$ are **mutually independent** if: + +#### Joint CDF condition: +$$ +F_X(x) = F_{X_1}(x_1) \cdot F_{X_2}(x_2) \cdots F_{X_d}(x_d) +$$ + +#### Joint PMF condition (discrete case): +$$ +p_X(x) = p_{X_1}(x_1) \cdot p_{X_2}(x_2) \cdots p_{X_d}(x_d) += \prod_{i=1}^d p(X_i = x_i) +\quad \forall x \in \mathbb{R}^d +$$ + +> (product of marginal CDFs / PMFs) + +--- + +### (b) If $X$ is a **discrete** random vector: + +$X_1, \dots, X_d$ are **mutually independent** if: + +#### Joint PMF condition: +$$ +p_X(x) = p_{X_1}(x_1) \cdot \cdots \cdot p_{X_d}(x_d) += \prod_{i=1}^d p(X_i = x_i) +\quad \forall x \in \mathbb{R}^d +$$ + +--- + +### (c) If $X$ is a **continuous** random vector: + +$X_1, \dots, X_d$ are **mutually independent** if: + +#### Joint PDF condition: +$$ +f_X(x) = f_{X_1}(x_1) \cdot \cdots \cdot f_{X_d}(x_d) += \prod_{i=1}^d f(X_i = x_i) +\quad \forall x \in \mathbb{R}^d +$$ + +> (product of marginal PDFs) + +--- + +## ➀ Conditional Independence of Random Vectors + +### 1. Conditional Mutual Independence (of r.v.s) in a Random Vector + +Let $X$ be a random vector, partitioned as: + +$$ +X \rightarrow \begin{bmatrix} X_A \\ X_B \end{bmatrix}, \quad \text{where } X_A = (X_1, \dots, X_{d-k})^T,\; X_B = (X_{d-k+1}, \dots, X_d)^T +$$ + +Then, the r.v.s of subvector $X_A$, i.e., $X_1, \dots, X_{d-k}$ are **conditionally mutually independent** if **each $X_i$ is conditionally independent of all subsets of $\{X_1, \dots, X_{i-1}, X_{i+1}, \dots, X_{d-k}\}$ conditioned on $X_B = x_B$**. + +--- + +### (a) If $X_A$ and $X_B$ are **discrete or continuous** random vectors: + +$X_1, \dots, X_{d-k}$ are **conditionally independent (conditioned on $X_B = x_B$)** iff: + +$$ +F_{X_1, \dots, X_{d-k} \mid X_B = x_B}(x_A) = \frac{\mathbb{P}(X_A \leq x_A,\; X_B = x_B)}{f_{X_B}(X_B = x_B)} +$$ + +- For **discrete** $X_A$, $X_B$: numerator is a joint pmf, denominator a marginal pmf +- For **continuous** $X_A$, $X_B$: numerator is a joint CDF, denominator a PDF + +--- + +If this is true, then: + +$$ +\mathbb{P}(X_1 \leq x_1, \dots, X_{d-k} \leq x_{d-k} \mid X_{d-k+1} = x_{d-k+1}, \dots, X_d = x_d) += \prod_{i=1}^{d-k} \mathbb{P}(X_i \leq x_i \mid X_{d-k+1} = x_{d-k+1}, \dots, X_d = x_d) +$$ + +--- + +βœ… Then we have **conditional mutual independence**. + + +--- + + + + diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/05 - Bayes Theorem.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/05 - Bayes Theorem.qmd new file mode 100644 index 0000000..232a0c8 --- /dev/null +++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/05 - Bayes Theorem.qmd @@ -0,0 +1,93 @@ +### Bayes' Theorem for Random Vectors + +The conditional cdf, pmf, or pdf for random vectors +leads us to a highly generalized form of Bayes' Theorem. +Recall that for events $A, B$, Bayes' Rule states that: + +$$ +P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \quad \text{and} \quad P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A) +$$ + +Bayes' Rule only allows us to condition on a single r.v. +at a time. Let us consider the random vector $X = (X_1, \ldots, X_d)^\top$. +Thus, by Bayes' Rule: + +--- + +**(a) If $X$ is discrete:** + +$$ +p_X(x) = p_{X_1, \ldots, X_d}(X_1 = x_1, \ldots, X_d = x_d) \quad \text{(joint)} +$$ + +$$ += P_{X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_d}(X_1 = x_1, \ldots, X_{i-1} = x_{i-1}, X_{i+1} = x_{i+1}, \ldots, X_d = x_d) +$$ + +$$ +\cdot P(X_i = x_i \mid X_1 = x_1, \ldots, X_{i-1} = x_{i-1}, X_{i+1} = x_{i+1}, \ldots, X_d = x_d) +$$ + +**e.g.** + +$$ +p_{X_1, X_2, X_3}(X_1 = x_1, X_2 = x_2, X_3 = x_3) += p(X_3 = x_3 \mid X_1 = x_1, X_2 = x_2) \cdot p(X_1 = x_1, X_2 = x_2) +$$ + +--- + +**(b) If $X$ is continuous:** + +$$ +f_X(x) = f_{X_1, \ldots, X_d}(X_1 = x_1, \ldots, X_d = x_d) \quad \text{(joint)} +$$ + +$$ += f_{X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_d}(X_1 = x_1, \ldots, X_{i-1} = x_{i-1}, X_{i+1} = x_{i+1}, \ldots, X_d = x_d) +$$ + +$$ +\cdot f_{X_i \mid X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_d}(X_i = x_i \mid X_1 = x_1, \ldots, X_{i-1} = x_{i-1}, X_{i+1} = x_{i+1}, \ldots, X_d = x_d) +$$ + +**e.g.** + +$$ +f_{X_1, X_2, X_3}(X_1 = x_1, X_2 = x_2, X_3 = x_3) += f(X_1 = x_1, X_2 = x_2) \cdot f(X_3 = x_3 \mid X_1 = x_1, X_2 = x_2) +$$ + + +--- + +A corollary to this is that we can further +decompose the marginal terms $p(X_1 = x_1, X_2 = x_2, \ldots, X_d = x_d)$ and +$f(X_1 = x_1, X_2 = x_2, \ldots, X_d = x_d)$, however we wish. + +Thus, + +### (a) If $X$ is discrete: + +$$ +p(X_1 = x_1, \ldots, X_d = x_d) = p(X_1 = x_1) \cdot p(X_2 = x_2 \mid X_1 = x_1) \cdot p(X_3 = x_3 \mid X_2 = x_2, X_1 = x_1) \cdots p(X_d = x_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1}) +$$ + +--- + +### (b) If $X$ is continuous: + +$$ +f(X_1 = x_1, \ldots, X_d = x_d) = f(X_1 = x_1) \cdot f(X_2 = x_2 \mid X_1 = x_1) \cdot f(X_3 = x_3 \mid X_1 = x_1, X_2 = x_2) \cdots f(X_d = x_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1}) +$$ + +--- + +Note that the above is just one possible order of decomposition. +There are $d!$ possible permutations. + +We calculate the conditionals and marginals as +described before (using summation/integration). + +--- + diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/06 - Law of total probability.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/06 - Law of total probability.qmd new file mode 100644 index 0000000..4e1d621 --- /dev/null +++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/06 - Law of total probability.qmd @@ -0,0 +1,99 @@ +### Law of Total Probability for Univariate & Multivariate r.v.s + +Recall that the univariate LOTP states that, given +disjoint partitions of the sample space $S_1, S_2, \ldots, S_n$ +such that $S = \bigcup_i S_i$ and $S_i \cap S_j = \emptyset$ if $i \neq j$, the +LOTP states that for any event $A \subseteq S$: + +$$ +P(A) = \sum_{i=1}^{n} P(S_i) \cdot P(A \mid S_i) = \sum_{i=1}^{n} P(A \cap S_i) +$$ + +where we must have that $P(S_i) > 0$. + +--- + +### (a) Univariate + +The above extends readily to the case where we +condition on a **discrete** univariate r.v. $X$: + +$$ +P(A) = \sum_{x \in \mathcal{R}_X} P(X = x) \cdot P(A \mid X = x) = \sum_{x \in \mathcal{R}_X} P(A \cap \{X = x\}) +$$ + +where $\mathcal{R}_X = \text{Range}(X)$. + +If we let $A$ be the event that $Y \in C$, where $C \subseteq \mathbb{R}$ +and $Y$ is a r.v., then: + +$$ +P(Y \in C) = \sum_{x \in \mathcal{R}_X} P(Y \in C \mid X = x) \cdot P(X = x) +$$ + +--- + +### (b) In the case of conditioning on a **continuous** univariate r.v., + +The probability $P(A \mid X = x)$ is ill-defined, +since the probability of a continuous r.v. +at a point-value like $x \in \mathbb{R}$ is zero. + +In this case, however, we can adapt to use the **pdf**: + +$$ +P(A) = \int_{-\infty}^{\infty} P(A \mid X = x) \cdot f_X(x) \, dx +$$ + +If we let $A$ be the event that... + + +### (ii) Multivariate + +Suppose we have a random vector $X = (X_1, \ldots, X_d)^\top$, +which we partition as $X \rightarrow \begin{pmatrix} X_A \\ X_B \end{pmatrix}$, +where: +- $X_A = (X_1, \ldots, X_{d-k})^\top$ (of dimension $d-k$) +- $X_B = (X_{d-k+1}, \ldots, X_d)^\top$ (of dimension $k$) + +--- + +#### (a) If $X$ is a **discrete** random vector + +Then, by the Law of Total Probability, the **marginal pmf** of $X_A$ is: + +$$ +p(X_A = x_A) = \sum_{x_{d-k+1} \in \mathcal{R}_{d-k+1}} \cdots \sum_{x_d \in \mathcal{R}_d} p_X(x) +$$ + +(using the full joint) + +$$ += \sum_{x_{d-k+1} \in \mathcal{R}_{d-k+1}} \cdots \sum_{x_d \in \mathcal{R}_d} p_{X_B}(X_B = x_B) \cdot p(X_A = x_A \mid X_B = x_B) +$$ + +where $\mathcal{R}_i = \text{Range}(X_i)$. + +We can then use the formulas for marginal & conditional joint pmfs. + +--- + +#### (b) If $X$ is a **continuous** random vector + +Then by the Law of Total Probability, the **marginal pdf** of $X_A$ is: + +$$ +f(X_A = x_A) = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_X(x) \, dx_{d-k+1} \cdots dx_d +$$ + +(using full joint) + +$$ += \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_{X_B}(X_B = x_B) \cdot f(X_A = x_A \mid X_B = x_B) \, dx_{d-k+1} \cdots dx_d +$$ + +We can then use the formulas for marginal & conditional joint pdfs. + +--- + + diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/07 - Conditional Law of Total Probability.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/07 - Conditional Law of Total Probability.qmd new file mode 100644 index 0000000..2fc4103 --- /dev/null +++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/07 - Conditional Law of Total Probability.qmd @@ -0,0 +1,55 @@ + +### Conditional Law of Total Probability (LOTP) for Random Vectors + +We can extend the **Multivariate LOTP** defined on the previous page +to a **conditional Multivariate LOTP**. + +If we have a random vector $X = (X_1, \ldots, X_d)$, partitioned as: + +$$ +X = \begin{pmatrix} X_A \\ X_B \end{pmatrix} \quad \text{($d-k$ r.v.s and $k$ r.v.s)} +$$ + +then, if we condition the joint distribution of $X$ +on a random vector $Z = (Z_1, \ldots, Z_m)$ with realization $Z = z \in \mathbb{R}^m$, we get: + +--- + +### (a) If $X$ and $Z$ are **discrete** random vectors: + +The conditional Multivariate LOTP states that the +**conditional marginal pmf** of $X_A$ (conditioned on $Z = z$) is: + +$$ +p(X_A = x_A \mid Z = z) = \sum_{x_{d-k+1} \in \mathcal{R}_{d-k+1}} \cdots \sum_{x_d \in \mathcal{R}_d} p_X(x \mid Z = z) +$$ + +(using $k$ summations) + +$$ += \sum_{x_{d-k+1} \in \mathcal{R}_{d-k+1}} \cdots \sum_{x_d \in \mathcal{R}_d} p_{X_B}(X_B = x_B \mid Z = z) \cdot p(X_A = x_A \mid X_B = x_B, Z = z) +$$ + +We can then use **Bayes' Rule** & the formulas for conditional +joint pmfs to calculate this. + +--- + +### (b) If $X$ and $Z$ are **continuous** random vectors: + +The conditional Multivariate LOTP states that the +**conditional marginal pdf** of $X_A$ (conditioned on $Z = z$) is: + +$$ +f(X_A = x_A \mid Z = z) = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_X(x \mid Z = z) \, dx_{d-k+1} \cdots dx_d +$$ + +(using $k$ integrations) + +$$ += \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_{X_B}(X_B = x_B \mid Z = z) \cdot f(X_A = x_A \mid X_B = x_B, Z = z) \, dx_{d-k+1} \cdots dx_d +$$ + +We can then use **Bayes' Rule** & formulas for conditional +joint pdfs to calculate this. +