From ef29b125fff71858c02720fae6d6aaedc49d1d72 Mon Sep 17 00:00:00 2001
From: CraftyEngineer <ishaangoel2005@gmail.com>
Date: Tue, 24 Jun 2025 11:33:09 +0530
Subject: [PATCH] Basic Multivariate Probability Commit

---
 .../01 - Random Vectors.qmd                   | 129 +++++++++
 .../02 - Joint Distribution.qmd               | 264 ++++++++++++++++++
 .../03 - Conditional Distribution.qmd         | 142 ++++++++++
 .../04 - Independence of Random Vectors.qmd   | 217 ++++++++++++++
 .../05 - Bayes Theorem.qmd                    |  93 ++++++
 .../06 - Law of total probability.qmd         |  99 +++++++
 ...- Conditional Law of Total Probability.qmd |  55 ++++
 7 files changed, 999 insertions(+)
 create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd
 create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd
 create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/03 - Conditional Distribution.qmd
 create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/04 - Independence of Random Vectors.qmd
 create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/05 - Bayes Theorem.qmd
 create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/06 - Law of total probability.qmd
 create mode 100644 book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/07 - Conditional Law of Total Probability.qmd

diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd
new file mode 100644
index 0000000..bd301fc
--- /dev/null
+++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/01 - Random Vectors.qmd	
@@ -0,0 +1,129 @@
+## 📘 Random Vectors (Multivariate Random Variables)
+
+So far we have been considering multiple r.v.s $X_1, X_2, \dots, X_d$ all as separate r.v.s. However, it is sometimes more convenient to consider them as one algebraic unit.
+
+A **random vector** (or a **multivariate random variable**) is one such possible unit. It is a $d$-dimensional vector of random variables $X_1, X_2, \dots, X_d$, where each r.v. lives on the same sample space $S$ and probability function $p(s)$ (which acts on outcomes $s \in S$). As each r.v. $X_i$ maps from $S$ to a real number, a random vector is a mapping from the sample space $S$ to a real vector in $\mathbb{R}^d$:
+
+$$
+X: S \rightarrow \mathbb{R}^d \quad \text{where} \quad X(s) = \begin{bmatrix} X_1(s) \\ X_2(s) \\ \vdots \\ X_d(s) \end{bmatrix}
+$$
+
+Each experimental run produces a realization of $X$, i.e., a vector of realized values $x = (x_1, x_2, \dots, x_d)^T \in \mathbb{R}^d$.
+
+> Note: that in general, being arranged as a vector does not tell us anything about the r.v.s $X_1, \dots, X_d$ (except that they share a sample space $S$ and a probability measure function $p$). In particular, r.v.s $X_i$ might be independent of each other, conditionally independent, or anything in between.
+
+---
+
+The **joint distribution** $p_X: \mathbb{R}^d \rightarrow [0, 1]$ is a probability function which assigns a probability value to every possible realization $x \in \mathbb{R}^d$. We denote the **joint distribution** of multivariate r.v. $X$ as $p_X(X)$ (or $\mathbb{P}(X_1, \dots, X_d)$), and **CDF** as:
+
+$$
+F_X(x) = \mathbb{P}(X_1 \leq x_1, \dots, X_d \leq x_d) = \mathbb{P}(X \leq x)
+$$
+
+where $x = (x_1, \dots, x_d)^T$ is a realization of multivariate r.v. $X = (X_1, \dots, X_d)^T$.
+
+---
+
+- **For discrete multivariate r.v.** $X$: $\text{pmf: } p_X(X = x) = \mathbb{P}(X_1 = x_1, \dots, X_d = x_d)$
+- **For continuous multivariate r.v.** $X$: $\text{pdf: } f_X(x) = f_{X_1, \dots, X_d}(x_1, \dots, x_d) = \frac{\partial^d}{\partial x_1 \partial x_2 \dots \partial x_d} \left[ F_X(x_1, \dots, x_d) \right]$
+
+
+## ➔ Operations on Random Vectors
+
+Similar to how we can algebraically manipulate random variables, we can manipulate a random vector $X = (X_1, X_2, \dots, X_d)^T$ using the principles of linear algebra, as though they were regular vectors.
+
+> The following applies to both discrete & continuous random vectors.
+
+---
+
+### ★ Expected Value of a Random Vector
+
+If $X = (X_1, X_2)^T, \; X: S \rightarrow \mathbb{R}^d$ is a random vector, then:
+
+$$
+\mathbb{E}(X) = 
+\begin{bmatrix}
+\mathbb{E}(X_1) \\
+\mathbb{E}(X_2)
+\end{bmatrix}
+= \mu_X \in \mathbb{R}^d
+$$
+
+$\mu_X$ is the **expected value of the random vector $X$**.
+
+After undergoing this transformation, $\mathbb{E}(X)$ is now a vector of real numbers (scalars), rather than a random variable (i.e., it no longer has a distribution associated with it).
+
+If $X, Y, Z$ are $d$-dimensional random vectors, then:
+
+$$
+\mathbb{E}(X + Y + Z) = \mathbb{E}(X) + \mathbb{E}(Y) + \mathbb{E}(Z)
+$$
+
+---
+
+### ➔ Affine Transform of a Random Vector
+
+We can produce a new random vector $Z: S \rightarrow \mathbb{R}^k$ by passing a vector $X = (X_1, \dots, X_d)^T$ through an **affine transform** (meaning we multiply it by a (fixed) matrix $A \in \mathbb{R}^{k \times d}$ and add a (fixed) vector $b \in \mathbb{R}^k$):
+
+$$
+Z = AX + b
+$$
+
+\[
+\begin{aligned}
+&\underbrace{Z}_{k \times 1} = 
+\underbrace{A}_{k \times d} 
+\cdot 
+\underbrace{X}_{d \times 1} 
++ 
+\underbrace{b}_{k \times 1}
+\end{aligned}
+\]
+
+- $Z$: new random vector  
+- $A$, $b$: fixed matrix and vector  
+
+
+### ➔ Expected Value of an Affine Transform
+
+Since $Z$ is also a random vector, we can continue to manipulate it algebraically.
+
+We can also use the **linearity of expectation** to establish the expected value of $Z$:
+
+$$
+\mathbb{E}(Z) = \mathbb{E}(AX + b) = A \cdot \mathbb{E}(X) + b
+$$
+
+\[
+\begin{aligned}
+&\underbrace{\mathbb{E}(Z)}_{k \times 1} = 
+\underbrace{A}_{k \times d} 
+\cdot 
+\underbrace{\mathbb{E}(X)}_{d \times 1} 
++ 
+\underbrace{b}_{k \times 1}
+\end{aligned}
+\]
+
+> Here $A$ and $b$ are high-dimensional constants.
+
+---
+
+### ➔ Orthogonality
+
+Multivariate r.v.s $X: S \rightarrow \mathbb{R}^d$ and $Z: S \rightarrow \mathbb{R}^l$ are **orthogonal** if:
+
+$$
+\mathbb{E}(X^T Z) = 0
+$$
+
+where:
+
+$$
+X^T Z = X_1 Z_1 + X_2 Z_2 + \dots + X_d Z_d
+$$
+
+
+
+
+
diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd
new file mode 100644
index 0000000..3e7a22f
--- /dev/null
+++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/02 - Joint Distribution.qmd	
@@ -0,0 +1,264 @@
+## ➔ Joint Distributions of Random Vectors
+
+Recall that by definition, a random vector $X = (X_1, \dots, X_d)$ is a collection of r.v.s:
+
+$$
+X_i : S \rightarrow \mathbb{R}, \quad X: S \rightarrow \mathbb{R}^d
+$$
+
+each of which maps from the **same sample space** $S$ to the **real line**. They also use the same **probability measure** $p(s)$ which assigns probability values to each outcome $s \in S$.
+
+We can thus imagine that the random vector is a mapping from $S$ to $\mathbb{R}^d$, i.e.,
+
+$$
+X: S \rightarrow \mathbb{R}^d \quad \text{where} \quad X(s) = 
+\begin{bmatrix}
+X_1(s) \\
+X_2(s) \\
+\vdots \\
+X_d(s)
+\end{bmatrix} \in \mathbb{R}^d, \quad \forall s \in S
+$$
+
+i.e., same point in $\mathbb{R}^d$ space.
+
+---
+
+Now just like a random variable must have its own distribution, so must a **random vector**. Thus:
+
+$$
+x = (x_1, \dots, x_d)
+$$
+
+which are different realizations of $X = (X_1, \dots, X_d)$ are assigned different probability values by the function:
+
+$$
+p_X : \mathbb{R}^d \rightarrow [0,1]
+$$
+
+---
+
+### ➤ Just like a r.v., a random vector may be discrete or continuous:
+
+1. $X: S \rightarrow \mathbb{R}^d$ is **discrete** if there exists a finite countable set $R_X$ such that:
+
+$$
+p(X \in R_X) = 1 \quad \forall x \in \mathbb{R}^d
+$$
+
+This is effectively the range of discrete random vector $X$.
+
+2. $X: S \rightarrow \mathbb{R}^d$ is **continuous** if:
+
+$$
+p(X = x) = 0 \quad \forall x \in \mathbb{R}^d
+$$
+
+i.e., it has zero probability at any realized point $x \in \mathbb{R}^d$.
+
+---
+
+## ➔ Joint PMF, PDF, and CDF of Random Vectors
+
+Armed with the notion of a discrete & continuous random vector, we can define the **joint pmf**, **joint pdf**, and **joint cdf**. These are definitions seen before in other sections.
+
+---
+
+### (i) If random vector $X: S \rightarrow \mathbb{R}^d$ is discrete or continuous:
+
+The **joint CDF** of $X = (X_1, X_2, \dots, X_d)$ is:
+
+$$
+F_X(x) = F_{X_1, X_2, \dots, X_d}(x_1, x_2, \dots, x_d) = \mathbb{P}(X_1 \leq x_1, \dots, X_d \leq x_d)
+$$
+
+- Joint CDF is monotonically non-decreasing (i.e., for random vectors $a \leq b \Leftrightarrow a_i \leq b_i \ \forall i$, then $F_X(a) \leq F_X(b)$).
+- Joint CDF is non-negative and limit towards $+\infty$ is 1 (while $-\infty$ is 0):
+
+$$
+\lim_{x_1 \to \infty} \cdots \lim_{x_d \to \infty} F_X(x_1, \dots, x_d) = 1
+$$
+
+$$
+\lim_{x_1 \to -\infty} \cdots \lim_{x_d \to -\infty} F_X(x_1, \dots, x_d) = 0
+$$
+
+---
+
+### (ii) If random vector $X: S \rightarrow \mathbb{R}^d$ is **discrete**:
+
+The **joint pmf** of $X = (X_1, \dots, X_d)$ is:
+
+$$
+p_X(x_1, \dots, x_d) = \mathbb{P}(X_1 = x_1, \dots, X_d = x_d)
+$$
+
+- Joint pmf is non-negative:
+
+$$
+\sum_{x \in R_X} p_X(x) = 1
+$$
+
+- As with univariate case, the joint pmf is essentially a lookup table for probabilities:
+
+$$
+\text{For } C \subseteq \mathbb{R}^d,\quad \mathbb{P}(X \in C) = \sum_{x \in C \cap R_X} p_X(x)
+$$
+
+---
+
+### (iii) If random vector $X: S \rightarrow \mathbb{R}^d$ is **continuous**:
+
+The **joint pdf** of $X = (X_1, \dots, X_d)$ is:
+
+$$
+f_X(x_1, \dots, x_d) = \frac{\partial^d}{\partial x_1 \dots \partial x_d} F_X(x_1, \dots, x_d)
+$$
+
+- $f_X(x) = 0$ if the derivative does not exist.
+- Joint pdf is non-negative and integrates to 1:
+
+$$
+\int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} f_X(x) \, dx_1 \, dx_2 \dots dx_d = 1
+$$
+
+
+- As with the univariate case, for a continuous random vector, we get the probability of values across a subset $C \subseteq \mathbb{R}^d$ by integrating over that set:
+
+$$
+\mathbb{P}(X \in C) = \int_C f_X(x_1, \dots, x_d) \, dx_1 \, dx_2 \dots dx_d
+$$
+
+- e.g., if $C = \{ x \in \mathbb{R}^2 \mid 1 < x_1 < 2,\; 1 < x_2 < 2 \}$, then:
+
+$$
+\mathbb{P}(X \in C) = \int_1^2 \int_1^2 f_{X_1, X_2}(x_1, x_2) \, dx_1 \, dx_2
+$$
+
+---
+
+- As with univariate integration from $-\infty$ to $\infty$, this gives:
+
+$$
+\mathbb{E}(X) = \mathbb{E}_{X_1, \dots, X_d}(x) = \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} x \cdot f_X(x_1, \dots, x_d) \, dx_1 \dots dx_d
+$$
+
+
+---
+
+
+## ✯ Summary of Results for Distributions
+### ➔ Marginal Distributions of a Random Vector
+
+Earlier we saw the marginal pmf of a set of discrete r.v.s, which we obtained by *summing out* or marginalizing some subset of the r.v.s $X_1, \dots, X_d$.
+
+In the notation of random vectors, let $X = (X_1, \dots, X_d)^T$ be a random vector. We can use the (multivariate) probability distribution of $X$ (pmf/pdf, and CDF) to obtain univariate distributions of $X_i$, or more generally, a distribution on some subset of $X_1, \dots, X_d$. Such distributions are called **marginal distributions**.
+
+---
+
+### ➤ To obtain univariate marginal distributions:
+
+#### (1) If $X$ is a discrete or continuous random vector:
+
+The **marginal CDF** of $X_i$ is:
+
+$$
+F_{X_i}(x) = \lim_{x_1 \to \infty} \cdots \lim_{x_{i-1} \to \infty} \lim_{x_{i+1} \to \infty} \cdots \lim_{x_d \to \infty} F_X(x)
+$$
+
+where $F_X(x) = F_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the joint CDF of $X$ at $x = (x_1, \dots, x_d)^T$.
+
+---
+
+#### (2) If $X$ is a **discrete random vector**:
+
+The **marginal pmf** of $X_i$ is:
+
+$$
+p_{X_i}(x_i) = \sum_{x_1 \in R_1} \cdots \sum_{x_{i-1} \in R_{i-1}} \sum_{x_{i+1} \in R_{i+1}} \cdots \sum_{x_d \in R_d} p_X(x)
+$$
+
+where:
+- $p_X(x) = p_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the joint pmf of $X$
+- $R_i := \text{Range}(X_i)$
+
+---
+
+#### (3) If $X$ is a **continuous random vector**:
+
+The **marginal pdf** of $X_i$ is:
+
+$$
+f_{X_i}(x_i) = \int \cdots \int f_X(x) \, dx_1 \cdots dx_{i-1} \, dx_{i+1} \cdots dx_d
+$$
+
+where:
+- $f_X(x) = f_{X_1, \dots, X_d}(x_1, \dots, x_d)$ is the **joint pdf** of $X$ at $x = (x_1, \dots, x_d)^T$
+
+
+---
+
+## ➤ To obtain higher-order marginals (joint marginals):
+
+Suppose from our random vector $X = (X_1, \dots, X_d)$, we select a subset of $k$ r.v.s which we want to retain, and marginalize out the rest, leaving a distribution over the remaining $(d - k)$ r.v.s.
+
+WLOG, let these be the last $k$ r.v.s: $X_{d-k+1}, \dots, X_d$
+
+Let us split our vector $X$ into two partitions:
+
+$$
+X = \begin{bmatrix}
+X_A \\
+X_B
+\end{bmatrix}
+$$
+
+We want to retain r.v.s in the vector $X_A = (X_1, X_2, \dots, X_k)^T$ and marginalize out those in $X_B = (X_{k+1}, \dots, X_d)^T$.
+
+We do this in a similar way as we did with the univariate marginals, only with fewer limits/sums/integrals over the joint CDF/PMF/PDF.
+
+We will not reproduce the equations here since they are very similar to the ones on the previous page. Instead, we give one example:
+
+Let $d = 6$ and we want to marginalize to $k = 3$ r.v.s:
+
+Then:  
+$X = (X_1, X_2, X_3, X_4, X_5, X_6)^T$  
+$X_A = (X_1, X_2, X_3)^T$,  
+$X_B = (X_4, X_5, X_6)^T$
+
+---
+
+### (a) If $X$ is discrete or continuous:
+
+The **marginal CDF** of $X_A$ is:
+
+$$
+F_{X_A}(x_A) = F_{X_1, X_2, X_3}(x_1, x_2, x_3) = \lim_{x_4 \to \infty} \lim_{x_5 \to \infty} \lim_{x_6 \to \infty} F_X(x)
+$$
+
+---
+
+### (b) If $X$ is a **discrete** random vector:
+
+The **marginal PMF** of $X_A$ is:
+
+$$
+p_{X_A}(x_A) = p_{X_1, X_2, X_3}(x_1, x_2, x_3) = \sum_{x_4 \in R_4} \sum_{x_5 \in R_5} \sum_{x_6 \in R_6} p_X(x_1, x_2, x_3, x_4, x_5, x_6)
+$$
+
+---
+
+### (c) If $X$ is a **continuous** random vector:
+
+The **marginal PDF** of $X_A$ is:
+
+$$
+f_{X_A}(x_A) = f_{X_1, X_2, X_3}(x_1, x_2, x_3) = \int \int \int f_X(x_1, x_2, x_3, x_4, x_5, x_6) \, dx_4 \, dx_5 \, dx_6
+$$
+
+
+
+
+
+
+
diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/03 - Conditional Distribution.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/03 - Conditional Distribution.qmd
new file mode 100644
index 0000000..d40052f
--- /dev/null
+++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/03 - Conditional Distribution.qmd	
@@ -0,0 +1,142 @@
+## ➤ Conditional Distributions of Random Vectors
+
+Similar to the marginal distribution, we can obtain the **conditional CDF/PMF/PDF** from the joint distribution of a random vector $X = (X_1, \dots, X_d)^T$.
+
+(We will not mimic the case of univariate conditioning since it is not frequently used.)
+
+Instead, suppose we partition $X \Rightarrow (X_A \mid X_B)^T$ where:
+
+- $X_A : S \rightarrow \mathbb{R}^k$  
+- $X_B : S \rightarrow \mathbb{R}^{d-k}$
+
+Then, we can **condition $X_A$ on a realization of** the random vector $X_B$, which we will denote $x_B \in \mathbb{R}^{d-k}$.
+
+Consequently, to obtain the **conditional distribution** of $X_A$ conditioned on $X_B = x_B$ (at a realized value $x_B$):
+
+---
+
+### (a) $X_A$ and $X_B$ are **discrete** random vectors (Conditional PMF):
+
+The **conditional PMF of $X_A$** is:
+
+$$
+p_{X_A \mid X_B}(x_A \mid x_B) = \frac{p_X(x_A, x_B)}{p_{X_B}(x_B)}
+$$
+
+or in expanded form:
+
+$$
+= \frac{p_{X_1, \dots, X_d}(x_1, \dots, x_k, x_{k+1}, \dots, x_d)}{p_{X_{k+1}, \dots, X_d}(x_{k+1}, \dots, x_d)}
+$$
+
+where:
+- $p_X(x)$ is the joint pmf of $X$ at $x = (x_1, \dots, x_d)^T$
+- $p_{X_B}(x_B)$ is the marginal pmf of $X_B$ (i.e., we have summed out $X_A$ in the marginalization)
+
+---
+
+### (b) $X_A$ and $X_B$ are **continuous** random vectors (Conditional PDF):
+
+Here we have an issue: the **PDF of $X_B$**, i.e., $f_{X_B}$, is zero at any realized value $x_B$. That is:
+
+$$
+\mathbb{P}(X_B = x_B) = 0 \quad \Rightarrow \quad f_{X_B}(x_B) = 0
+$$
+
+And thus, our numerator and denominator are both zero if we try to emulate the steps we did to calculate the conditional pmf.
+
+### ➤ Conditional PDF of Random Vectors (continued)
+
+Instead, we perform a trick similar to what we do to calculate the conditional PDF in the **univariate** case. Instead of conditioning on the event $X_B = x_B$ (i.e., the conjunction $X_{d-k+1} = x_{d-k+1}, \dots, X_d = x_d$), we condition on the event that each r.v. $X_{B_i}$ lies in a small interval $\epsilon_i > 0$ from the corresponding realized value $x_{B_i}$. That is:
+
+$$
+x_{d-k+1} - \epsilon < X_{d-k+1} < x_{d-k+1} + \epsilon, \dots, X_d \in (x_d - \epsilon, x_d + \epsilon)
+$$
+
+Such events will generally have non-zero probability and hence can be conditioned on. We can then take the limit of each $\epsilon_i \to 0$ and use L'Hôpital's rule.
+
+---
+
+### ⭐ Conditional PDF of $X_A$ conditioned on $X_B = x_B$:
+
+$$
+f_{X_A \mid X_B = x_B}(x_A) = \frac{f_X(x)}{f_{X_B}(x_B)}
+$$
+
+That is,
+
+$$
+f_{X_1, \dots, X_k \mid X_{k+1}, \dots, X_d}(x_1, \dots, x_k \mid x_{k+1}, \dots, x_d)
+= \frac{f_{X_1, \dots, X_d}(x_1, \dots, x_k, x_{k+1}, \dots, x_d)}{f_{X_{k+1}, \dots, X_d}(x_{k+1}, \dots, x_d)}
+$$
+
+- where $f_X(x)$ is the joint PDF of $X$ at $x = (x_1, \dots, x_d)^T$
+- and $f_{X_B}(x_B)$ is the marginal PDF of $X_B$ at $x_B = (x_{k+1}, \dots, x_d)^T$
+
+> Note that while the probabilities $\mathbb{P}(X = x)$ and $\mathbb{P}(X_B = x_B)$ are both zero, the values of $f_X(x)$ and $f_{X_B}(x_B)$ are both greater than zero (as they are values of a joint pdf).
+
+---
+
+### 📌 Note: We also define the conditional CDF as:
+
+$$
+F_{X_A \mid X_B = x_B}(x_A) = \mathbb{P}(X_A \leq x_A \mid X_B = x_B)
+$$
+
+and
+
+$$
+f_{X_A \mid X_B = x_B}(x_A) = \frac{\partial^k}{\partial x_1 \cdots \partial x_k} F_{X_A \mid X_B = x_B}(x_A \mid x_B)
+$$
+
+
+
+> This quantity is different depending on the nature of $X$.
+
+---
+
+### (i) $X_A$ and $X_B$ are **discrete** random vectors:
+
+$$
+F_{X_A \mid X_B = x_B}(x_A) = \mathbb{P}(X_A \leq x_A \mid X_B = x_B) = \frac{\mathbb{P}(X_A \leq x_A,\; X_B = x_B)}{\mathbb{P}(X_B = x_B)}
+$$
+
+- (Probability calculated from **joint pmf**)
+
+That is:
+
+$$
+= \frac{p(X_1 \leq x_1,\; \dots,\; X_k \leq x_k,\; X_{k+1} = x_{k+1},\; \dots,\; X_d = x_d)}{p(X_{k+1} = x_{k+1},\; \dots,\; X_d = x_d)}
+$$
+
+- (Probability calculated from **marginal pmf** of $X_B$ by marginalizing out $X_A$)
+
+---
+
+### (ii) $X_A$, $X_B$ are **continuous** random vectors:
+
+$$
+F_{X_A \mid X_B = x_B}(x_A) = \frac{\int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_k} f_X(x_A, x_B) \, dx_1 \dots dx_k}{f_{X_B}(x_B)}
+$$
+
+- (Probability calculated from **joint pdf**)
+
+- Denominator is the value of **marginal pdf** of $X_B$, not a probability.
+
+---
+
+### 📝 Note:
+
+Here, $X_B = x_B$ is an event.  
+We can also condition on a **different event involving $X_B$**.
+
+$X_A \leq x_A$ is also an event — meaning:
+
+$$
+\{X_1 \leq x_1,\; X_2 \leq x_2,\; \dots,\; X_{d-k} \leq x_{d-k} \}
+$$
+
+---
+
+
+
diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/04 - Independence of Random Vectors.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/04 - Independence of Random Vectors.qmd
new file mode 100644
index 0000000..69a3d69
--- /dev/null
+++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/04 - Independence of Random Vectors.qmd	
@@ -0,0 +1,217 @@
+## ➤ Independence of Random Vectors
+
+We stated the independence of multiple r.v.s on earlier pages.  
+Let us now do the same using a random vector, which has the **notion of independence**:
+
+---
+
+### 1. Independence of Random Sub-vectors
+
+Let $X = (X_1, \dots, X_d)^T$ be a random vector. Partition it as:
+
+$$
+X \rightarrow \begin{bmatrix} X_A \\ X_B \end{bmatrix}, \quad \text{where } X_A = (X_1, \dots, X_{d-k})^T,\; X_B = (X_{d-k+1}, \dots, X_d)^T
+$$
+
+Then, **random vectors $X_A$ and $X_B$ are independent** iff their **joint distribution equals the product of their marginal distributions**, i.e.:
+
+$$
+p(X = x) = p_{X_A}(X_A = x_A) \cdot p_{X_B}(X_B = x_B)
+$$
+
+for all $x \in \mathbb{R}^d$, where $x = \begin{bmatrix} x_A \\ x_B \end{bmatrix}$.
+
+We denote this as:
+
+$$
+X_A \perp\!\!\!\perp X_B
+$$
+
+---
+
+### (a) If $X$ is a **discrete or continuous** random vector:
+
+Random vectors $X_A$ and $X_B$ are independent if their **joint CDF** equals the product of the marginal CDFs of $X_A$ and $X_B$, i.e.,
+
+$$
+F_X(x) = F_{X_1, \dots, X_{d-k}}(x_1, \dots, x_{d-k}) \cdot F_{X_{d-k+1}, \dots, X_d}(x_{d-k+1}, \dots, x_d)
+$$
+
+for all vectors $x \in \mathbb{R}^d$, where $x = \begin{bmatrix} x_A \\ x_B \end{bmatrix}$.
+
+---
+
+### (b) If $X$ is a **discrete random vector**:
+
+Random vectors $X_A$ and $X_B$ are independent if their **joint PMF** equals the product of their marginal PMFs:
+
+$$
+p_X(x) = p_{X_A}(X_A = x_A) \cdot p_{X_B}(X_B = x_B)
+$$
+
+i.e.,
+
+$$
+p_{X}(X = x) = p_{X_A}(X_A = x_A) \cdot p_{X_B}(X_B = x_B)
+$$
+
+for all $x \in \mathbb{R}^d$, with $x = \begin{bmatrix} x_A \\ x_B \end{bmatrix}$.
+
+---
+
+### 🔁 Equivalent Condition:
+
+$X_A \perp\!\!\!\perp X_B$ if the **conditional PMF** of $X_A$ given $X_B$ is equal to the marginal of $X_A$, and vice versa:
+
+$$
+p_{X_A \mid X_B = x_B}(X_A = x_A) = p_{X_A}(X_A = x_A) \quad \forall x \in \mathbb{R}^d
+$$
+
+and
+
+$$
+p_{X_B \mid X_A = x_A}(X_B = x_B) = p_{X_B}(X_B = x_B)
+$$
+
+### (c) If $X$ is a **continuous random vector**:
+
+Random vectors $X_A$ and $X_B$ are independent if their **joint pdf** equals the product of their marginal pdfs:
+
+$$
+f_X(x) = f_{X_A}(x_A) \cdot f_{X_B}(x_B)
+$$
+
+i.e.,
+
+$$
+f_X(x_1, \dots, x_d) = f_{X_1, \dots, X_{d-k}}(x_1, \dots, x_{d-k}) \cdot f_{X_{d-k+1}, \dots, X_d}(x_{d-k+1}, \dots, x_d)
+$$
+
+for all $x \in \mathbb{R}^d$, where $x = \begin{bmatrix} x_A \\ x_B \end{bmatrix}$.
+
+---
+
+### 🔁 Equivalently:
+
+$X_A \perp\!\!\!\perp X_B$ if the **conditional pdf** of $X_A$ (conditioned on $X_B$) is equal to the **marginal pdf** of $X_A$, and vice versa:
+
+$$
+f_{X_A \mid X_B = x_B}(X_A = x_A) = f_{X_A}(X_A = x_A) \quad \forall x \in \mathbb{R}^d
+$$
+
+and
+
+$$
+f_{X_B \mid X_A = x_A}(X_B = x_B) = f_{X_B}(X_B = x_B)
+$$
+
+---
+
+### 🧠 Conceptual Clarification:
+
+The independence of random vectors, as we saw here, is a bit different from independence of random variables.
+
+Here, it means that **every subset of r.v.s from $X_A$ is independent of every subset of r.v.s from $X_B$**.
+
+However, the r.v.s *within* $X_A$ might **not be independent of each other** (and same for $X_B$).  
+If they are, they are said to be **mutually independent**.
+
+
+## ✦ Mutual Independence (of r.v.s) in a Random Vector
+
+If $X = (X_1, \dots, X_d)^T$ is a random vector, then each of the variables $X_1, \dots, X_d$ is **mutually independent** if each $X_i$ is independent of all subsets of $\{X_1, \dots, X_{i-1}, X_{i+1}, \dots, X_d\}$.
+
+---
+
+### (a) If $X$ is a **continuous or discrete** random vector:
+
+$X_1, \dots, X_d$ are **mutually independent** if:
+
+#### Joint CDF condition:
+$$
+F_X(x) = F_{X_1}(x_1) \cdot F_{X_2}(x_2) \cdots F_{X_d}(x_d)
+$$
+
+#### Joint PMF condition (discrete case):
+$$
+p_X(x) = p_{X_1}(x_1) \cdot p_{X_2}(x_2) \cdots p_{X_d}(x_d)
+= \prod_{i=1}^d p(X_i = x_i)
+\quad \forall x \in \mathbb{R}^d
+$$
+
+> (product of marginal CDFs / PMFs)
+
+---
+
+### (b) If $X$ is a **discrete** random vector:
+
+$X_1, \dots, X_d$ are **mutually independent** if:
+
+#### Joint PMF condition:
+$$
+p_X(x) = p_{X_1}(x_1) \cdot \cdots \cdot p_{X_d}(x_d)
+= \prod_{i=1}^d p(X_i = x_i)
+\quad \forall x \in \mathbb{R}^d
+$$
+
+---
+
+### (c) If $X$ is a **continuous** random vector:
+
+$X_1, \dots, X_d$ are **mutually independent** if:
+
+#### Joint PDF condition:
+$$
+f_X(x) = f_{X_1}(x_1) \cdot \cdots \cdot f_{X_d}(x_d)
+= \prod_{i=1}^d f(X_i = x_i)
+\quad \forall x \in \mathbb{R}^d
+$$
+
+> (product of marginal PDFs)
+
+---
+
+## ➤ Conditional Independence of Random Vectors
+
+### 1. Conditional Mutual Independence (of r.v.s) in a Random Vector
+
+Let $X$ be a random vector, partitioned as:
+
+$$
+X \rightarrow \begin{bmatrix} X_A \\ X_B \end{bmatrix}, \quad \text{where } X_A = (X_1, \dots, X_{d-k})^T,\; X_B = (X_{d-k+1}, \dots, X_d)^T
+$$
+
+Then, the r.v.s of subvector $X_A$, i.e., $X_1, \dots, X_{d-k}$ are **conditionally mutually independent** if **each $X_i$ is conditionally independent of all subsets of $\{X_1, \dots, X_{i-1}, X_{i+1}, \dots, X_{d-k}\}$ conditioned on $X_B = x_B$**.
+
+---
+
+### (a) If $X_A$ and $X_B$ are **discrete or continuous** random vectors:
+
+$X_1, \dots, X_{d-k}$ are **conditionally independent (conditioned on $X_B = x_B$)** iff:
+
+$$
+F_{X_1, \dots, X_{d-k} \mid X_B = x_B}(x_A) = \frac{\mathbb{P}(X_A \leq x_A,\; X_B = x_B)}{f_{X_B}(X_B = x_B)}
+$$
+
+- For **discrete** $X_A$, $X_B$: numerator is a joint pmf, denominator a marginal pmf
+- For **continuous** $X_A$, $X_B$: numerator is a joint CDF, denominator a PDF
+
+---
+
+If this is true, then:
+
+$$
+\mathbb{P}(X_1 \leq x_1, \dots, X_{d-k} \leq x_{d-k} \mid X_{d-k+1} = x_{d-k+1}, \dots, X_d = x_d)
+= \prod_{i=1}^{d-k} \mathbb{P}(X_i \leq x_i \mid X_{d-k+1} = x_{d-k+1}, \dots, X_d = x_d)
+$$
+
+---
+
+✅ Then we have **conditional mutual independence**.
+
+
+---
+
+
+
+
diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/05 - Bayes Theorem.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/05 - Bayes Theorem.qmd
new file mode 100644
index 0000000..232a0c8
--- /dev/null
+++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/05 - Bayes Theorem.qmd	
@@ -0,0 +1,93 @@
+### Bayes' Theorem for Random Vectors
+
+The conditional cdf, pmf, or pdf for random vectors  
+leads us to a highly generalized form of Bayes' Theorem.  
+Recall that for events $A, B$, Bayes' Rule states that:
+
+$$
+P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \quad \text{and} \quad P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)
+$$
+
+Bayes' Rule only allows us to condition on a single r.v.  
+at a time. Let us consider the random vector $X = (X_1, \ldots, X_d)^\top$.  
+Thus, by Bayes' Rule:
+
+---
+
+**(a) If $X$ is discrete:**
+
+$$
+p_X(x) = p_{X_1, \ldots, X_d}(X_1 = x_1, \ldots, X_d = x_d) \quad \text{(joint)}
+$$
+
+$$
+= P_{X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_d}(X_1 = x_1, \ldots, X_{i-1} = x_{i-1}, X_{i+1} = x_{i+1}, \ldots, X_d = x_d)
+$$
+
+$$
+\cdot P(X_i = x_i \mid X_1 = x_1, \ldots, X_{i-1} = x_{i-1}, X_{i+1} = x_{i+1}, \ldots, X_d = x_d)
+$$
+
+**e.g.**
+
+$$
+p_{X_1, X_2, X_3}(X_1 = x_1, X_2 = x_2, X_3 = x_3)
+= p(X_3 = x_3 \mid X_1 = x_1, X_2 = x_2) \cdot p(X_1 = x_1, X_2 = x_2)
+$$
+
+---
+
+**(b) If $X$ is continuous:**
+
+$$
+f_X(x) = f_{X_1, \ldots, X_d}(X_1 = x_1, \ldots, X_d = x_d) \quad \text{(joint)}
+$$
+
+$$
+= f_{X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_d}(X_1 = x_1, \ldots, X_{i-1} = x_{i-1}, X_{i+1} = x_{i+1}, \ldots, X_d = x_d)
+$$
+
+$$
+\cdot f_{X_i \mid X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_d}(X_i = x_i \mid X_1 = x_1, \ldots, X_{i-1} = x_{i-1}, X_{i+1} = x_{i+1}, \ldots, X_d = x_d)
+$$
+
+**e.g.**
+
+$$
+f_{X_1, X_2, X_3}(X_1 = x_1, X_2 = x_2, X_3 = x_3)  
+= f(X_1 = x_1, X_2 = x_2) \cdot f(X_3 = x_3 \mid X_1 = x_1, X_2 = x_2)
+$$
+
+
+---
+
+A corollary to this is that we can further  
+decompose the marginal terms $p(X_1 = x_1, X_2 = x_2, \ldots, X_d = x_d)$ and  
+$f(X_1 = x_1, X_2 = x_2, \ldots, X_d = x_d)$, however we wish.  
+
+Thus,
+
+### (a) If $X$ is discrete:
+
+$$
+p(X_1 = x_1, \ldots, X_d = x_d) = p(X_1 = x_1) \cdot p(X_2 = x_2 \mid X_1 = x_1) \cdot p(X_3 = x_3 \mid X_2 = x_2, X_1 = x_1) \cdots p(X_d = x_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1})
+$$
+
+---
+
+### (b) If $X$ is continuous:
+
+$$
+f(X_1 = x_1, \ldots, X_d = x_d) = f(X_1 = x_1) \cdot f(X_2 = x_2 \mid X_1 = x_1) \cdot f(X_3 = x_3 \mid X_1 = x_1, X_2 = x_2) \cdots f(X_d = x_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1})
+$$
+
+---
+
+Note that the above is just one possible order of decomposition.  
+There are $d!$ possible permutations.
+
+We calculate the conditionals and marginals as  
+described before (using summation/integration).
+
+---
+
diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/06 - Law of total probability.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/06 - Law of total probability.qmd
new file mode 100644
index 0000000..4e1d621
--- /dev/null
+++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/06 - Law of total probability.qmd	
@@ -0,0 +1,99 @@
+### Law of Total Probability for Univariate & Multivariate r.v.s
+
+Recall that the univariate LOTP states that, given  
+disjoint partitions of the sample space $S_1, S_2, \ldots, S_n$  
+such that $S = \bigcup_i S_i$ and $S_i \cap S_j = \emptyset$ if $i \neq j$, the  
+LOTP states that for any event $A \subseteq S$:
+
+$$
+P(A) = \sum_{i=1}^{n} P(S_i) \cdot P(A \mid S_i) = \sum_{i=1}^{n} P(A \cap S_i)
+$$
+
+where we must have that $P(S_i) > 0$.
+
+---
+
+### (a) Univariate
+
+The above extends readily to the case where we  
+condition on a **discrete** univariate r.v. $X$:
+
+$$
+P(A) = \sum_{x \in \mathcal{R}_X} P(X = x) \cdot P(A \mid X = x) = \sum_{x \in \mathcal{R}_X} P(A \cap \{X = x\})
+$$
+
+where $\mathcal{R}_X = \text{Range}(X)$.
+
+If we let $A$ be the event that $Y \in C$, where $C \subseteq \mathbb{R}$  
+and $Y$ is a r.v., then:
+
+$$
+P(Y \in C) = \sum_{x \in \mathcal{R}_X} P(Y \in C \mid X = x) \cdot P(X = x)
+$$
+
+---
+
+### (b) In the case of conditioning on a **continuous** univariate r.v.,
+
+The probability $P(A \mid X = x)$ is ill-defined,  
+since the probability of a continuous r.v.  
+at a point-value like $x \in \mathbb{R}$ is zero.  
+
+In this case, however, we can adapt to use the **pdf**:
+
+$$
+P(A) = \int_{-\infty}^{\infty} P(A \mid X = x) \cdot f_X(x) \, dx
+$$
+
+If we let $A$ be the event that...
+
+
+### (ii) Multivariate
+
+Suppose we have a random vector $X = (X_1, \ldots, X_d)^\top$,  
+which we partition as $X \rightarrow \begin{pmatrix} X_A \\ X_B \end{pmatrix}$,  
+where:  
+- $X_A = (X_1, \ldots, X_{d-k})^\top$ (of dimension $d-k$)  
+- $X_B = (X_{d-k+1}, \ldots, X_d)^\top$ (of dimension $k$)
+
+---
+
+#### (a) If $X$ is a **discrete** random vector
+
+Then, by the Law of Total Probability, the **marginal pmf** of $X_A$ is:
+
+$$
+p(X_A = x_A) = \sum_{x_{d-k+1} \in \mathcal{R}_{d-k+1}} \cdots \sum_{x_d \in \mathcal{R}_d} p_X(x)
+$$
+
+(using the full joint)
+
+$$
+= \sum_{x_{d-k+1} \in \mathcal{R}_{d-k+1}} \cdots \sum_{x_d \in \mathcal{R}_d} p_{X_B}(X_B = x_B) \cdot p(X_A = x_A \mid X_B = x_B)
+$$
+
+where $\mathcal{R}_i = \text{Range}(X_i)$.
+
+We can then use the formulas for marginal & conditional joint pmfs.
+
+---
+
+#### (b) If $X$ is a **continuous** random vector
+
+Then by the Law of Total Probability, the **marginal pdf** of $X_A$ is:
+
+$$
+f(X_A = x_A) = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_X(x) \, dx_{d-k+1} \cdots dx_d
+$$
+
+(using full joint)
+
+$$
+= \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_{X_B}(X_B = x_B) \cdot f(X_A = x_A \mid X_B = x_B) \, dx_{d-k+1} \cdots dx_d
+$$
+
+We can then use the formulas for marginal & conditional joint pdfs.
+
+---
+
+
diff --git a/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/07 - Conditional Law of Total Probability.qmd b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/07 - Conditional Law of Total Probability.qmd
new file mode 100644
index 0000000..2fc4103
--- /dev/null
+++ b/book/chapters/03 Playing with Uncertainty/06 - Multivariate Probability/07 - Conditional Law of Total Probability.qmd	
@@ -0,0 +1,55 @@
+
+### Conditional Law of Total Probability (LOTP) for Random Vectors
+
+We can extend the **Multivariate LOTP** defined on the previous page  
+to a **conditional Multivariate LOTP**.
+
+If we have a random vector $X = (X_1, \ldots, X_d)$, partitioned as:
+
+$$
+X = \begin{pmatrix} X_A \\ X_B \end{pmatrix} \quad \text{($d-k$ r.v.s and $k$ r.v.s)}
+$$
+
+then, if we condition the joint distribution of $X$  
+on a random vector $Z = (Z_1, \ldots, Z_m)$ with realization $Z = z \in \mathbb{R}^m$, we get:
+
+---
+
+### (a) If $X$ and $Z$ are **discrete** random vectors:
+
+The conditional Multivariate LOTP states that the  
+**conditional marginal pmf** of $X_A$ (conditioned on $Z = z$) is:
+
+$$
+p(X_A = x_A \mid Z = z) = \sum_{x_{d-k+1} \in \mathcal{R}_{d-k+1}} \cdots \sum_{x_d \in \mathcal{R}_d} p_X(x \mid Z = z)
+$$
+
+(using $k$ summations)
+
+$$
+= \sum_{x_{d-k+1} \in \mathcal{R}_{d-k+1}} \cdots \sum_{x_d \in \mathcal{R}_d} p_{X_B}(X_B = x_B \mid Z = z) \cdot p(X_A = x_A \mid X_B = x_B, Z = z)
+$$
+
+We can then use **Bayes' Rule** & the formulas for conditional  
+joint pmfs to calculate this.
+
+---
+
+### (b) If $X$ and $Z$ are **continuous** random vectors:
+
+The conditional Multivariate LOTP states that the  
+**conditional marginal pdf** of $X_A$ (conditioned on $Z = z$) is:
+
+$$
+f(X_A = x_A \mid Z = z) = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_X(x \mid Z = z) \, dx_{d-k+1} \cdots dx_d
+$$
+
+(using $k$ integrations)
+
+$$
+= \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_{X_B}(X_B = x_B \mid Z = z) \cdot f(X_A = x_A \mid X_B = x_B, Z = z) \, dx_{d-k+1} \cdots dx_d
+$$
+
+We can then use **Bayes' Rule** & formulas for conditional  
+joint pdfs to calculate this.
+