@@ -125,7 +125,7 @@ Many statisticians and econometricians
125125use rules of thumb such as "outcomes more than four or five
126126standard deviations from the mean can safely be ignored."
127127
128- But this is only true when distributions have light tails...
128+ But this is only true when distributions have light tails.
129129
130130
131131### When are light tails valid?
@@ -400,7 +400,7 @@ Notice how extreme outcomes are more common.
400400
401401### Counter CDFs
402402
403- For nonnegative random varialbes , one way to visualize the difference between
403+ For nonnegative random variables , one way to visualize the difference between
404404light and heavy tails is to look at the
405405** counter CDF** (CCDF).
406406
@@ -424,7 +424,7 @@ $$ G_P(x) = x^{- \alpha} $$
424424
425425This function goes to zero as $x \to \infty$, but much slower than $G_E$.
426426
427- Here's a plot that illustrates how $G_E$ goes to zero faster that $G_P$.
427+ Here's a plot that illustrates how $G_E$ goes to zero faster than $G_P$.
428428
429429``` {code-cell} ipython3
430430x = np.linspace(1.5, 100, 1000)
@@ -452,12 +452,12 @@ In the log-log plot, the Pareto CCDF is linear, while the exponential one is
452452concave.
453453
454454This idea is often used to separate light- and heavy-tailed distributions in
455- visualations --- we return to this point below.
455+ visualisations --- we return to this point below.
456456
457457
458458### Empirical CCDFs
459459
460- The sample countpart of the CCDF function is the ** empirical CCDF** .
460+ The sample counterpart of the CCDF function is the ** empirical CCDF** .
461461
462462Given a sample $x_1, \ldots, x_n$, the empirical CCDF is given by
463463
@@ -529,7 +529,7 @@ We can write this more mathematically as
529529```
530530
531531It is also common to say that a random variable $X$ with this property
532- has a ** Pareto tail** with ** tail index** $\alpha$ if
532+ has a ** Pareto tail** with ** tail index** $\alpha$.
533533
534534Notice that every Pareto distribution with tail index $\alpha$
535535has a ** Pareto tail** with ** tail index** $\alpha$.
@@ -548,7 +548,7 @@ As mentioned above, heavy tails are pervasive in economic data.
548548
549549In fact power laws seem to be very common as well.
550550
551- We now illustrate this by showing the empirical CCDF of
551+ We now illustrate this by showing the empirical CCDF of heavy tails.
552552
553553All plots are in log-log, so that a power law shows up as a linear log-log
554554plot, at least in the upper tail.
@@ -642,7 +642,7 @@ def extract_wb(varlist=['NY.GDP.MKTP.CD'],
642642
643643### Firm size
644644
645- Here is a plot of the firm size distribution taken from Forbes Global 2000.
645+ Here is a plot of the firm size distribution for the largest 500 firms in 2020 taken from Forbes Global 2000.
646646
647647``` {code-cell} ipython3
648648:tags: [hide-input]
@@ -652,46 +652,39 @@ df_fs = df_fs[['Country', 'Sales', 'Profits', 'Assets', 'Market Value']]
652652fig, ax = plt.subplots(figsize=(6.4, 3.5))
653653
654654label="firm size (market value)"
655+ top = 500 # set the cutting for top
655656d = df_fs.sort_values('Market Value', ascending=False)
656- empirical_ccdf(np.asarray(d['Market Value'])[0:500 ], ax, label=label, add_reg_line=True)
657+ empirical_ccdf(np.asarray(d['Market Value'])[:top ], ax, label=label, add_reg_line=True)
657658
658659plt.show()
659660```
660661
661662### City size
662663
663- Here is a plot of the city size distribution for the US, where size is
664- measured by population.
664+ Here are plots of the city size distribution for the US and brazil in 2023 from world population review.
665+
666+ The size is measured by population.
665667
666668``` {code-cell} ipython3
667669:tags: [hide-input]
668670
669- df_cs_us = pd.read_csv('https://raw.githubusercontent.com/QuantEcon/high_dim_data/update_csdata/cross_section/cities_us.txt', delimiter="\t", header=None)
670- df_cs_us = df_cs_us[[0, 3]]
671- df_cs_us.columns = 'rank', 'pop'
672- x = np.asarray(df_cs_us['pop'])
673- citysize = []
674- for i in x:
675- i = i.replace(",", "")
676- citysize.append(int(i))
677- df_cs_us['pop'] = citysize
678- df_cs_br = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_brazil.csv', delimiter=",", header=None)
679- df_cs_br.columns = df_cs_br.iloc[0]
680- df_cs_br = df_cs_br[1:401]
681- df_cs_br = df_cs_br.astype({"pop2023": float})
671+ # import population data of cities in 2023 United States and 2023 Brazil from world population review
672+ df_cs_us = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_us.csv')
673+ df_cs_br = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_csdata/cross_section/cities_brazil.csv')
682674
683675fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6))
684- empirical_ccdf(np.asarray(df_cs_us['pop']), axes[0], label="US", add_reg_line=True)
676+
677+ empirical_ccdf(np.asarray(df_cs_us["pop2023"]), axes[0], label="US", add_reg_line=True)
685678empirical_ccdf(np.asarray(df_cs_br['pop2023']), axes[1], label="Brazil", add_reg_line=True)
686679
687680plt.show()
688681```
689682
690683### Wealth
691684
692- Here is a plot of the upper tail of the wealth distribution.
685+ Here is a plot of the upper tail (top 500) of the wealth distribution.
693686
694- The data is from the Forbes billionaires list.
687+ The data is from the Forbes Billionaires list in 2020 .
695688
696689``` {code-cell} ipython3
697690:tags: [hide-input]
@@ -710,10 +703,11 @@ for i, c in enumerate(countries):
710703 df_w_c = df_w[df_w['country'] == c].reset_index()
711704 z = np.asarray(df_w_c['realTimeWorth'])
712705 # print('number of the global richest 2000 from '+ c, len(z))
713- if len(z) <= 500: # cut-off number: top 500
714- z = z[0:500]
706+ top = 500 # cut-off number: top 500
707+ if len(z) <= top:
708+ z = z[:top]
715709
716- empirical_ccdf(z[0:500 ], axs[i], label=c, xlabel='log wealth', add_reg_line=True)
710+ empirical_ccdf(z[:top ], axs[i], label=c, xlabel='log wealth', add_reg_line=True)
717711
718712fig.tight_layout()
719713
@@ -742,6 +736,8 @@ df_gdp1 = extract_wb(varlist=variable_code,
742736```
743737
744738``` {code-cell} ipython3
739+ :tags: [hide-input]
740+
745741fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6))
746742
747743for name, ax in zip(variable_names, axes):
@@ -856,7 +852,7 @@ You will find that
856852
857853Diversification reduces risk, as expected.
858854
859- But there is a hidden assumption here: the variance is of returns is finite.
855+ But there is a hidden assumption here: the variance of returns is finite.
860856
861857If the distribution is heavy-tailed and the variance is infinite, then this
862858logic is incorrect.
@@ -1071,9 +1067,6 @@ Present discounted value of tax revenue will be estimated by
107110671. multiplying by the tax rate, and
107210681. summing the results with discounting to obtain present value.
10731069
1074- If $X$ has the Pareto distribution, then there are positive constants $\bar x$
1075- and $\alpha$ such that
1076-
10771070The Pareto distribution is assumed to take the form {eq}`pareto` with $\bar x = 1$ and $\alpha = 1.05$.
10781071
10791072(The value the tail index $\alpha$ is plausible given the data {cite}`gabaix2016power`.)
0 commit comments