@@ -24,8 +24,8 @@ from math import exp, log
2424
2525## Introduction
2626
27- Consider a situation where a policymaker is trying to estimate how much revenue a proposed wealth tax
28- will raise.
27+ Consider a situation where a policymaker is trying to estimate how much revenue
28+ a proposed wealth tax will raise.
2929
3030The proposed tax is
3131
3939
4040where $w$ is wealth.
4141
42- For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means a
43- 5% tax on wealth up to 2.5 and 10% tax on wealth in excess of 2.5.
4442
45- (The units will be 100,000 so 2.5 means 250,000 dollars.)
43+ For example, if $a = 0.05$, $b = 0.1$, and $\bar w = 2.5$, this means
4644
47- Here we define $h$
45+ * a 5% tax on wealth up to 2.5 and
46+ * a 10% tax on wealth in excess of 2.5.
47+
48+ The unit is 100,000, so $w= 2.5$ means 250,000 dollars.
49+
50+ Let's go ahead and define $h$:
4851
4952``` {code-cell} ipython3
5053def h(w, a=0.05, b=0.1, w_bar=2.5):
@@ -54,18 +57,21 @@ def h(w, a=0.05, b=0.1, w_bar=2.5):
5457 return a * w_bar + b * (w - w_bar)
5558```
5659
57- For a population of size $N$, where individual $i$ has wealth $w_i$, the total revenue will be given by
60+ For a population of size $N$, where individual $i$ has wealth $w_i$, total revenue raised by
61+ the tax will be
5862
5963$$
6064 T = \sum_{i=1}^{N} h(w_i)
6165$$
6266
63- However, in most countries wealth is not observed for all individuals.
67+ We wish to calculate this quantity.
68+
69+ The problem we face is that, in most countries, wealth is not observed for all individuals.
6470
6571Collecting and maintaining accurate wealth data for all individuals or households in a country
6672is just too hard.
6773
68- So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ individuals.
74+ So let's suppose instead that we obtain a sample $w_1, w_2, \cdots, w_n$ telling us the wealth of $n$ randomly selected individuals.
6975
7076For our exercise we are going to use a sample of $n = 10,000$ observations from wealth data in the US in 2016.
7177
@@ -140,15 +146,15 @@ Maximum likelihood estimation has two steps:
1401462. Estimate the parameter values (e.g., estimate $\mu$ and $\sigma$ for the
141147 normal distribution)
142148
143- One reasonable assumption for the wealth is that each
144- that $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
149+ One possible assumption for the wealth is that each
150+ $w_i$ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),
145151with parameters $\mu \in (-\infty,\infty)$ and $\sigma \in (0,\infty)$.
146152
147- This means that $\ln w_i$ is normally distributed with mean $\mu$ and
148- standard deviation $\sigma$.
153+ (This means that $\ln w_i$ is normally distributed with mean $\mu$ and standard deviation $\sigma$.)
149154
150- You can see that this is a reasonable assumption because if we histogram log wealth
151- instead of wealth the picture starts to look something like a bell-shaped curve.
155+ You can see that this assumption is not completely unreasonable because, if we
156+ histogram log wealth instead of wealth, the picture starts to look something
157+ like a bell-shaped curve.
152158
153159```{code-cell} ipython3
154160ln_sample = np.log(sample)
@@ -157,8 +163,6 @@ ax.hist(ln_sample, density=True, bins=200, histtype='stepfilled', alpha=0.8)
157163plt.show()
158164```
159165
160- +++ {"user_expressions": []}
161-
162166Now our job is to obtain the maximum likelihood estimates of $\mu$ and $\sigma$, which
163167we denote by $\hat{\mu}$ and $\hat{\sigma}$.
164168
@@ -168,55 +172,60 @@ data.
168172The pdf of a lognormally distributed random variable $X$ is given by:
169173
170174$$
171- f(x) = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}} exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
175+ f(x, \mu, \sigma)
176+ = \frac{1}{x}\frac{1}{\sigma \sqrt{2\pi}}
177+ \exp\left(\frac{-1}{2}\left(\frac{\ln x-\mu}{\sigma}\right)\right)^2
172178$$
173179
174- Since $\ln X$ is normally distributed this is the same as
175-
176- $$
177- f(x) = \frac{1}{x} \phi(x)
178- $$
179-
180- where $\phi$ is the pdf of $\ln X$ which is normally distibuted with mean $\mu$ and variance $\sigma ^2$.
181-
182- For a sample $x = (x_1, x_2, \cdots, x_n)$ the _likelihood function_ is given by:
180+ For our sample $w_1, w_2, \cdots, w_n$, the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function) is given by
183181
184182$$
185183\begin{aligned}
186- L(\mu, \sigma | x_i) &= \prod_ {i=1}^{n} f(\mu, \sigma | x_i) \\
187- &= \prod_ {i=1}^{n} \frac{1}{x_i} \phi(\ln x_i)
184+ L(\mu, \sigma | w_i) = \prod_ {i=1}^{n} f(w_i, \mu, \sigma) \\
188185\end{aligned}
189186$$
190187
188+ The likelihood function can be viewed as both
189+
190+ * the joint distribution of the sample (which is assumed to be IID) and
191+ * the "likelihood" of parameters $(\mu, \sigma)$ given the data.
191192
192- Taking $\log$ on both sides gives us the _log likelihood function_ which is:
193+ Taking logs on both sides gives us the log likelihood function, which is
193194
194195$$
195196\begin{aligned}
196- \ell(\mu, \sigma | x_i) &= -\sum_ {i=1}^{n} \ln x_i + \sum_ {i=1}^n \phi(\ln x_i) \\
197- &= -\sum_ {i=1}^{n} \ln x_i - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2} \sum_ {i=1}^n (\ln x_i - \mu)^2
197+ \ell(\mu, \sigma | w_i)
198+ & = \ln \left[ \prod_ {i=1}^{n} f(w_i, \mu, \sigma) \right] \\
199+ & = -\sum_ {i=1}^{n} \ln w_i
200+ - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \sigma^2 - \frac{1}{2\sigma^2}
201+ \sum_ {i=1}^n (\ln w_i - \mu)^2
198202\end{aligned}
199203$$
200204
201205To find where this function is maximised we find its partial derivatives wrt $\mu$ and $\sigma ^2$ and equate them to $0$.
202206
203- Let's first find the MLE of $\mu$,
207+ Let's first find the maximum likelihood estimate ( MLE) of $\mu$
204208
205209$$
206210\begin{aligned}
207- \frac{\delta l}{\delta \mu} = - \frac{1}{2\sigma^2} \times 2 \sum_ {i=1}^n (\ln x_i - \mu) &= 0 \\
208- \sum_ {i=1}^n (\ln x_i - n \mu) &= 0 \\
209- \hat{\mu} &= \frac{\sum_ {i=1}^n \ln x_i}{n}
211+ \frac{\delta \ell}{\delta \mu}
212+ = - \frac{1}{2\sigma^2} \times 2 \sum_ {i=1}^n (\ln w_i - \mu) = 0 \\
213+ \implies \sum_ {i=1}^n \ln w_i - n \mu = 0 \\
214+ \implies \hat{\mu} = \frac{\sum_ {i=1}^n \ln w_i}{n}
210215\end{aligned}
211216$$
212217
213- Now let's find the MLE of $\sigma$,
218+ Now let's find the MLE of $\sigma$
214219
215220$$
216221\begin{aligned}
217- \frac{\delta l}{\delta \sigma^2} = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln x_i - \mu)^2 &= 0 \\
218- \frac{n}{2\sigma^2} &= \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln x_i - \mu)^2 \\
219- \hat{\sigma} &= \left( \frac{\sum_ {i=1}^{n}(\ln x_i - \hat{\mu})^2}{n} \right)^{1/2}
222+ \frac{\delta \ell}{\delta \sigma^2}
223+ = - \frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}
224+ \sum_ {i=1}^n (\ln w_i - \mu)^2 = 0 \\
225+ \implies \frac{n}{2\sigma^2} =
226+ \frac{1}{2\sigma^4} \sum_ {i=1}^n (\ln w_i - \mu)^2 \\
227+ \implies \hat{\sigma} =
228+ \left( \frac{\sum_ {i=1}^{n}(\ln w_i - \hat{\mu})^2}{n} \right)^{1/2}
220229\end{aligned}
221230$$
222231
@@ -249,7 +258,7 @@ ax.legend()
249258plt.show()
250259```
251260
252- Our estimated lognormal distribution appears to be a decent fit for the overall data.
261+ Our estimated lognormal distribution appears to be a reasonable fit for the overall data.
253262
254263We now use {eq}`eq:est_rev` to calculate total revenue.
255264
@@ -337,8 +346,6 @@ ax.legend()
337346plt.show()
338347```
339348
340- +++ {"user_expressions": []}
341-
342349We observe that in this case the fit for the Pareto distribution is not very
343350good, so we can probably reject it.
344351
@@ -423,8 +430,6 @@ ax.plot(x, dist_pareto_tail.pdf(x), 'k-', lw=0.5, label='pareto pdf')
423430plt.show()
424431```
425432
426- +++ {"user_expressions": []}
427-
428433The Pareto distribution is a better fit for the right hand tail of our dataset.
429434
430435### So what is the best distribution?
@@ -438,7 +443,8 @@ One test is to plot the data against the fitted distribution, as we did.
438443
439444There are other more rigorous tests, such as the [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
440445
441- We omit the details.
446+ We omit such advanced topics (but encourage readers to study them once
447+ they have completed these lectures).
442448
443449## Exercises
444450
@@ -475,8 +481,6 @@ tr_expo = total_revenue(dist_exp)
475481tr_expo
476482```
477483
478- +++ {"user_expressions": []}
479-
480484```{solution-end}
481485```
482486
@@ -504,7 +508,6 @@ ax.legend()
504508plt.show()
505509```
506510
507- +++ {"user_expressions": []}
508511
509512Clearly, this distribution is not a good fit for our data.
510513
0 commit comments