@@ -23,7 +23,7 @@ kernelspec:
2323
2424## Outline
2525
26- In this lecture we give a quick introduction to data and probability distributions using Python
26+ In this lecture we give a quick introduction to data and probability distributions using Python.
2727
2828``` {code-cell} ipython3
2929:tags: [hide-output]
@@ -42,7 +42,7 @@ import seaborn as sns
4242
4343## Common distributions
4444
45- In this section we recall the definitions of some well-known distributions and show how to manipulate them with SciPy.
45+ In this section we recall the definitions of some well-known distributions and explore how to manipulate them with SciPy.
4646
4747### Discrete distributions
4848
@@ -61,7 +61,7 @@ $$ \mathbb P\{X = x_i\} = p(x_i) \quad \text{for } i= 1, \ldots, n $$
6161The ** mean** or ** expected value** of a random variable $X$ with distribution $p$ is
6262
6363$$
64- \mathbb E X = \sum_{i=1}^n x_i p(x_i)
64+ \mathbb{E}[X] = \sum_{i=1}^n x_i p(x_i)
6565$$
6666
6767Expectation is also called the * first moment* of the distribution.
@@ -71,15 +71,15 @@ We also refer to this number as the mean of the distribution (represented by) $p
7171The ** variance** of $X$ is defined as
7272
7373$$
74- \mathbb V X = \sum_{i=1}^n (x_i - \mathbb E X )^2 p(x_i)
74+ \mathbb{V}[X] = \sum_{i=1}^n (x_i - \mathbb{E}[X] )^2 p(x_i)
7575$$
7676
7777Variance is also called the * second central moment* of the distribution.
7878
7979The ** cumulative distribution function** (CDF) of $X$ is defined by
8080
8181$$
82- F(x) = \mathbb P \{X \leq x\}
82+ F(x) = \mathbb{P} \{X \leq x\}
8383 = \sum_{i=1}^n \mathbb 1\{x_i \leq x\} p(x_i)
8484$$
8585
@@ -157,6 +157,75 @@ Check that your answers agree with `u.mean()` and `u.var()`.
157157```
158158
159159
160+ #### Bernoulli distribution
161+
162+ Another useful (and more interesting) distribution is the Bernoulli distribution
163+
164+ We can import the uniform distribution on $S = \{ 1, \ldots, n\} $ from SciPy like so:
165+
166+ ``` {code-cell} ipython3
167+ n = 10
168+ u = scipy.stats.randint(1, n+1)
169+ ```
170+
171+
172+ Here's the mean and variance
173+
174+ ``` {code-cell} ipython3
175+ u.mean(), u.var()
176+ ```
177+
178+ The formula for the mean is $(n+1)/2$, and the formula for the variance is $(n^2 - 1)/12$.
179+
180+
181+ Now let's evaluate the PMF
182+
183+ ``` {code-cell} ipython3
184+ u.pmf(1)
185+ ```
186+
187+ ``` {code-cell} ipython3
188+ u.pmf(2)
189+ ```
190+
191+
192+ Here's a plot of the probability mass function:
193+
194+ ``` {code-cell} ipython3
195+ fig, ax = plt.subplots()
196+ S = np.arange(1, n+1)
197+ ax.plot(S, u.pmf(S), linestyle='', marker='o', alpha=0.8, ms=4)
198+ ax.vlines(S, 0, u.pmf(S), lw=0.2)
199+ ax.set_xticks(S)
200+ plt.show()
201+ ```
202+
203+
204+ Here's a plot of the CDF:
205+
206+ ``` {code-cell} ipython3
207+ fig, ax = plt.subplots()
208+ S = np.arange(1, n+1)
209+ ax.step(S, u.cdf(S))
210+ ax.vlines(S, 0, u.cdf(S), lw=0.2)
211+ ax.set_xticks(S)
212+ plt.show()
213+ ```
214+
215+
216+ The CDF jumps up by $p(x_i)$ and $x_i$.
217+
218+
219+ ``` {exercise}
220+ :label: prob_ex2
221+
222+ Calculate the mean and variance for this parameterization (i.e., $n=10$)
223+ directly from the PMF, using the expressions given above.
224+
225+ Check that your answers agree with `u.mean()` and `u.var()`.
226+ ```
227+
228+
160229
161230#### Binomial distribution
162231
@@ -170,7 +239,7 @@ Here $\theta \in [0,1]$ is a parameter.
170239
171240The interpretation of $p(i)$ is: the number of successes in $n$ independent trials with success probability $\theta$.
172241
173- (If $\theta=0.5$, this is "how many heads in $n$ flips of a fair coin")
242+ (If $\theta=0.5$, p(i) can be "how many heads in $n$ flips of a fair coin")
174243
175244The mean and variance are
176245
@@ -215,12 +284,12 @@ plt.show()
215284
216285
217286``` {exercise}
218- :label: prob_ex2
287+ :label: prob_ex3
219288
220289Using `u.pmf`, check that our definition of the CDF given above calculates the same function as `u.cdf`.
221290```
222291
223- ``` {solution-start} prob_ex2
292+ ``` {solution-start} prob_ex3
224293:class: dropdown
225294```
226295
@@ -304,7 +373,7 @@ The definition of the mean and variance of a random variable $X$ with distributi
304373For example, the mean of $X$ is
305374
306375$$
307- \mathbb E X = \int_{-\infty}^\infty x p(x) dx
376+ \mathbb{E}[X] = \int_{-\infty}^\infty x p(x) dx
308377$$
309378
310379The ** cumulative distribution function** (CDF) of $X$ is defined by
@@ -328,7 +397,7 @@ This distribution has two parameters, $\mu$ and $\sigma$.
328397
329398It can be shown that, for this distribution, the mean is $\mu$ and the variance is $\sigma^2$.
330399
331- We can obtain the moments, PDF, and CDF of the normal density as follows:
400+ We can obtain the moments, PDF and CDF of the normal density as follows:
332401
333402``` {code-cell} ipython3
334403μ, σ = 0.0, 1.0
@@ -659,7 +728,7 @@ x.mean(), x.var()
659728
660729
661730``` {exercise}
662- :label: prob_ex3
731+ :label: prob_ex4
663732
664733Check that the formulas given above produce the same numbers.
665734```
@@ -700,6 +769,7 @@ The monthly return is calculated as the percent change in the share price over e
700769So we will have one observation for each month.
701770
702771``` {code-cell} ipython3
772+ :tags: [hide-output]
703773df = yf.download('AMZN', '2000-1-1', '2023-1-1', interval='1mo' )
704774prices = df['Adj Close']
705775data = prices.pct_change()[1:] * 100
@@ -777,6 +847,7 @@ Violin plots are particularly useful when we want to compare different distribut
777847For example, let's compare the monthly returns on Amazon shares with the monthly return on Apple shares.
778848
779849``` {code-cell} ipython3
850+ :tags: [hide-output]
780851df = yf.download('AAPL', '2000-1-1', '2023-1-1', interval='1mo' )
781852prices = df['Adj Close']
782853data = prices.pct_change()[1:] * 100
0 commit comments