@@ -24,8 +24,8 @@ from math import exp, log
2424
2525## Introduction
2626
27- Consider a situation where a policymaker is trying to estimate how much revenue a proposed wealth tax
28- will raise.
27+ Consider a situation where a policymaker is trying to estimate how much revenue
28+ a proposed wealth tax will raise.
2929
3030The proposed tax is
3131
3939
4040where $w$ is wealth.
4141
42- For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means a
43- 5% tax on wealth up to 2.5 and 10% tax on wealth in excess of 2.5.
4442
45- (The units will be 100,000 so 2.5 means 250,000 dollars.)
43+ For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means
4644
47- Here we define $h$
45+ * a 5% tax on wealth up to 2.5 and
46+ * a 10% tax on wealth in excess of 2.5.
47+
48+ The unit is 100,000, so $w= 2.5$ means 250,000 dollars.
49+
50+ Let's go ahead and define $h$:
4851
4952``` {code-cell} ipython3
5053def h(w, a=0.05, b=0.1, w_bar=2.5):
@@ -54,18 +57,21 @@ def h(w, a=0.05, b=0.1, w_bar=2.5):
5457 return a * w_bar + b * (w - w_bar)
5558```
5659
57- For a population of size $N$, where individual $i$ has wealth $w_i$, the total revenue will be given by
60+ For a population of size $N$, where individual $i$ has wealth $w_i$, total revenue raised by
61+ the tax will be
5862
5963$$
6064 T = \sum_{i=1}^{N} h(w_i)
6165$$
6266
63- However, in most countries wealth is not observed for all individuals.
67+ We wish to calculate this quantity.
68+
69+ The problem we face is that, in most countries, wealth is not observed for all individuals.
6470
6571Collecting and maintaining accurate wealth data for all individuals or households in a country
6672is just too hard.
6773
68- So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ individuals.
74+ So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ randomly selected individuals.
6975
7076For our exercise we are going to use a sample of $n = 10,000$ observations from wealth data in the US in 2016.
7177
@@ -140,15 +146,15 @@ Maximum likelihood estimation has two steps:
1401462. Estimate the parameter values (e.g., estimate $\mu$ and $\sigma$ for the
141147 normal distribution)
142148
143- One reasonable assumption for the wealth is that each
144- that $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
149+ One possible assumption for the wealth is that each
150+ $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
145151with parameters $\mu \in (-\infty,\infty)$ and $\sigma \in (0,\infty)$.
146152
147- This means that $\ln w_i$ is normally distributed with mean $\mu$ and
148- standard deviation $\sigma$.
153+ (This means that $\ln w_i$ is normally distributed with mean $\mu$ and standard deviation $\sigma$.)
149154
150- You can see that this is a reasonable assumption because if we histogram log wealth
151- instead of wealth the picture starts to look something like a bell-shaped curve.
155+ You can see that this assumption is not completely unreasonable because, if we
156+ histogram log wealth instead of wealth, the picture starts to look something
157+ like a bell-shaped curve.
152158
153159```{code-cell} ipython3
154160ln_sample = np.log(sample)
@@ -157,59 +163,69 @@ ax.hist(ln_sample, density=True, bins=200, histtype='stepfilled', alpha=0.8)
157163plt.show()
158164```
159165
160- +++ {"user_expressions": []}
161-
162166Now our job is to obtain the maximum likelihood estimates of $\mu$ and $\sigma$, which
163167we denote by $\hat{\mu}$ and $\hat{\sigma}$.
164168
165169These estimates can be found by maximizing the likelihood function given the
166170data.
167171
168172The pdf of a lognormally distributed random variable $X$ is given by:
169- $$
170- f(x) = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}} exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
171- $$
172173
173- Since $\ln X$ is normally distributed this is the same as
174174$$
175- f(x) = \frac{1}{x} \phi(x)
175+ f(x, \mu, \sigma)
176+ = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}}
177+ \exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
176178$$
177- where $\phi$ is the pdf of $\ln X$ which is normally distibuted with mean $\mu$ and variance $\sigma ^2$.
178179
179- For a sample $x = (x_1, x_2, \cdots, x_n)$ the _likelihood function_ is given by:
180+ For our sample $w_1, w_2, \cdots, w_n$, the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function) is given by
181+
180182$$
181183\begin{aligned}
182- L(\mu, \sigma | x_i) = \prod_ {i=1}^{n} f(\mu, \sigma | x_i) \\
183- L(\mu, \sigma | x_i) = \prod_ {i=1}^{n} \frac{1}{x_i} \phi(\ln x_i)
184+ L(\mu, \sigma | w_i) = \prod_ {i=1}^{n} f(w_i, \mu, \sigma) \\
184185\end{aligned}
185186$$
186187
187- Taking $\log$ on both sides gives us the _log likelihood function_ which is:
188+ The likelihood function can be viewed as both
189+
190+ * the joint distribution of the sample (which is assumed to be IID) and
191+ * the "likelihood" of parameters $(\mu, \sigma)$ given the data.
192+
193+ Taking logs on both sides gives us the log likelihood function, which is
194+
188195$$
189196\begin{aligned}
190- l(\mu, \sigma | x_i) = -\sum_ {i=1}^{n} \ln x_i + \sum_ {i=1}^n \phi(\ln x_i) \\
191- l(\mu, \sigma | x_i) = -\sum_ {i=1}^{n} \ln x_i - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
192- \sum_ {i=1}^n (\ln x_i - \mu)^2
197+ \ell(\mu, \sigma | w_i)
198+ & = \ln \left[ \prod_ {i=1}^{n} f(w_i, \mu, \sigma) \right] \\
199+ & = -\sum_ {i=1}^{n} \ln w_i
200+ - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
201+ \sum_ {i=1}^n (\ln w_i - \mu)^2
193202\end{aligned}
194203$$
195204
196205To find where this function is maximised we find its partial derivatives wrt $\mu$ and $\sigma ^2$ and equate them to $0$.
197206
198- Let's first find the MLE of $\mu$,
207+ Let's first find the maximum likelihood estimate (MLE) of $\mu$
208+
199209$$
200210\begin{aligned}
201- \frac{\delta l}{\delta \mu} = - \frac{1}{2\sigma^2} \times 2 \sum_ {i=1}^n (\ln x_i - \mu) = 0 \\
202- \Rightarrow \sum_ {i=1}^n \ln x_i - n \mu = 0 \\
203- \Rightarrow \hat{\mu} = \frac{\sum_ {i=1}^n \ln x_i}{n}
211+ \frac{\delta \ell}{\delta \mu}
212+ = - \frac{1}{2\sigma^2} \times 2 \sum_ {i=1}^n (\ln w_i - \mu) = 0 \\
213+ \implies \sum_ {i=1}^n \ln w_i - n \mu = 0 \\
214+ \implies \hat{\mu} = \frac{\sum_ {i=1}^n \ln w_i}{n}
204215\end{aligned}
205216$$
206217
207- Now let's find the MLE of $\sigma$,
218+ Now let's find the MLE of $\sigma$
219+
208220$$
209221\begin{aligned}
210- \frac{\delta l}{\delta \sigma^2} = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln x_i - \mu)^2 = 0 \\
211- \Rightarrow \frac{n}{2\sigma^2} = \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln x_i - \mu)^2 \\
212- \Rightarrow \hat{\sigma} = \left( \frac{\sum_ {i=1}^{n}(\ln x_i - \hat{\mu})^2}{n} \right)^{1/2}
222+ \frac{\delta \ell}{\delta \sigma^2}
223+ = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}
224+ \sum_ {i=1}^n (\ln w_i - \mu)^2 = 0 \\
225+ \implies \frac{n}{2\sigma^2} =
226+ \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln w_i - \mu)^2 \\
227+ \implies \hat{\sigma} =
228+ \left( \frac{\sum_ {i=1}^{n}(\ln w_i - \hat{\mu})^2}{n} \right)^{1/2}
213229\end{aligned}
214230$$
215231
@@ -242,7 +258,7 @@ ax.legend()
242258plt.show()
243259```
244260
245- Our estimated lognormal distribution appears to be a decent fit for the overall data.
261+ Our estimated lognormal distribution appears to be a reasonable fit for the overall data.
246262
247263We now use {eq}`eq:est_rev` to calculate total revenue.
248264
@@ -331,8 +347,6 @@ ax.legend()
331347plt.show()
332348```
333349
334- +++ {"user_expressions": []}
335-
336350We observe that in this case the fit for the Pareto distribution is not very
337351good, so we can probably reject it.
338352
@@ -417,8 +431,6 @@ ax.plot(x, dist_pareto_tail.pdf(x), 'k-', lw=0.5, label='pareto pdf')
417431plt.show()
418432```
419433
420- +++ {"user_expressions": []}
421-
422434The Pareto distribution is a better fit for the right hand tail of our dataset.
423435
424436### So what is the best distribution?
@@ -432,7 +444,8 @@ One test is to plot the data against the fitted distribution, as we did.
432444
433445There are other more rigorous tests, such as the [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
434446
435- We omit the details.
447+ We omit such advanced topics (but encourage readers to study them once
448+ they have completed these lectures).
436449
437450## Exercises
438451
@@ -469,8 +482,6 @@ tr_expo = total_revenue(dist_exp)
469482tr_expo
470483```
471484
472- +++ {"user_expressions": []}
473-
474485```{solution-end}
475486```
476487
@@ -498,7 +509,6 @@ ax.legend()
498509plt.show()
499510```
500511
501- +++ {"user_expressions": []}
502512
503513Clearly, this distribution is not a good fit for our data.
504514
0 commit comments