Skip to content

Commit d0ef293

Browse files
steppimdhaber
andauthored
DOC: stats: minor edits to resampling and Monte Carlo methods tutorials (#54)
DOC: stats: minor edits to resampling and Monte Carlo methods tutorials Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
1 parent 9cf67b7 commit d0ef293

File tree

6 files changed

+37
-29
lines changed

6 files changed

+37
-29
lines changed

ipython/ResamplingAndMonteCarloMethods/resampling_tutorial_1.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -152,9 +152,9 @@
152152
"id": "87d87faf-f1e1-462f-a94c-cf09d9445767",
153153
"metadata": {},
154154
"source": [
155-
"As we can see, the *Monte Carlo null distribution* <a name=\"cite_ref-2\"></a>[<sup>[2]</sup>](#cite_note-2) of the test statistic when samples are drawn according to the null hypothesis (from a normal distribution) appears to follow the *asymptotic null distribution* (chi-squared with two degrees of freedom). \n",
155+
"As we can see, the *Monte Carlo null distribution* <a name=\"cite_ref-2\"></a>[<sup>**†**</sup>](#cite_note-1) of the test statistic when samples are drawn according to the null hypothesis (from a normal distribution) appears to follow the *asymptotic null distribution* (chi-squared with two degrees of freedom). \n",
156156
"\n",
157-
"<a name=\"cite_ref-2\"></a>[<sup>[2]</sup>](#cite_note-2) Named after the Monte Carlo Casino in Monaco, apparently [[3]](https://en.wikipedia.org/wiki/Monte_Carlo_method#Historyhttps://en.wikipedia.org/wiki/Monte_Carlo_method#History).\n"
157+
"<a name=\"cite_ref-1\"></a>[<sup>**†**</sup>](#cite_note-1) Named after the Monte Carlo Casino in Monaco, apparently [[3]](https://en.wikipedia.org/wiki/Monte_Carlo_method#Historyhttps://en.wikipedia.org/wiki/Monte_Carlo_method#History).\n"
158158
]
159159
},
160160
{
@@ -213,7 +213,7 @@
213213
"id": "78720622-ff97-4248-85a1-6e13b786ea06",
214214
"metadata": {},
215215
"source": [
216-
"When the $p$-value is small, we take this as evidence against the null hypothesis, since samples drawn under the null hypothesis have a low probability of producing such an extreme value of the statistic. For better or for worse, a common \"confidence level\" used for statistical tests is 0.99, meaning that the threshold for rejection of the null hypothesis is $p \\leq 0.01$. If we adopt this criterion, then the Jarque-Bera test was inconclusive; it gives no evidence that the null hypothesis is false. Although this should *not* be taken as evidence that the null hypothesis is *true*, the lack of evidence against the hypothesis of normality is often considered sufficient to proceed with tests that assume the data is drawn from a normal population."
216+
"When the $p$-value is small, we take this as evidence against the null hypothesis, since samples drawn under the null hypothesis have a low probability of producing such an extreme value of the statistic. For better or for worse, a common \"confidence level\" used for statistical tests is 0.99, meaning that the threshold for rejection of the null hypothesis is $p \\leq 0.01$. If we adopt this criterion, then the Jarque-Bera test was inconclusive; it gives insufficient evidence to conclude the null hypothesis is false. Although this should *not* be taken as evidence that the null hypothesis is *true*, the lack of evidence against the hypothesis of normality is often considered sufficient to proceed with tests that assume the data is drawn from a normal population."
217217
]
218218
},
219219
{
@@ -301,7 +301,7 @@
301301
"id": "e27fd4f9-64eb-42ca-9131-b6f1cefa9698",
302302
"metadata": {},
303303
"source": [
304-
"These $p$-values are substantially different, so we might draw different conclusions about the validity of the null hypothesis depending on which test we perform. Under the 1% threshold used above, the Monte Carlo test would suggest that there is evidence for rejection of the null hypothesis whereas the asymptotic test performed by `stats.jarque_bera` would not. In other cases, the opposite may be true. In any case, it seems that the Monte Carlo test should be preferred when the number of observations is small.\n",
304+
"These $p$-values are substantially different, so we might draw different conclusions about the validity of the null hypothesis depending on which test we perform. Under the 1% threshold used above, the Monte Carlo test would suggest that there is enough evidence for rejection of the null hypothesis whereas the asymptotic test performed by `stats.jarque_bera` would not. In other cases, the opposite may be true. In any case, it seems that the Monte Carlo test should be preferred when the number of observations is small.\n",
305305
"\n",
306306
"`stats.monte_carlo_test` simplifies the process of performing a Monte Carlo test. All we need to provide is the obverved data, a function that generates data sampled under under the null hypothesis, and a function that computes the test statistic. `monte_carlo_test` returns an object with the observed statistic value, an empirical null distribution of the statistic, and the corresponding $p$-value."
307307
]
@@ -521,7 +521,7 @@
521521
"id": "bf116179-6679-4ca4-b938-549886b92b0d",
522522
"metadata": {},
523523
"source": [
524-
"To the eyes of the author, this does not look like a terrific fit. The mode of the Rayleigh distribution is too far to the right compared to cluster of observations around 160 lb. Also, according to this Rayleigh distribution, there is zero probability that any weights could be less than ~135 lb, which does not seem realistic. However, the `ks_1samp` and `cramervonmises` tests are both inconclusive, with relatively large $p$-values."
524+
"To the eyes of the author, this does not look like a terrific fit. The mode of the Rayleigh distribution is too far to the right compared to the cluster of observations around 160 lb. Also, according to this Rayleigh distribution, there is zero probability that any weights could be less than ~135 lb, which does not seem realistic. However, the `ks_1samp` and `cramervonmises` tests are both inconclusive, with relatively large $p$-values."
525525
]
526526
},
527527
{

ipython/ResamplingAndMonteCarloMethods/resampling_tutorial_2.ipynb

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"source": [
88
"## Permutation Tests\n",
99
"### Exact Tests\n",
10-
"Consider the following experiment from [An Introduction to the Bootstrap](https://books.google.com/books?id=MWC1DwAAQBAJ&printsec=frontcoverhttps://books.google.com/books?id=MWC1DwAAQBAJ&printsec=frontcover). A new medical treatment is intended to prolong life after a form of surgery. Sixteen mice are randomly assigned to either a treatment group or control group under the constraint that only seven treatments are available. All mice receive the surgery, but only the treatment group will receive the treatment being studied. The survival time of each mouse after surgery is recorded below."
10+
"Consider the following experiment from Efron and Tibshirani's [An Introduction to the Bootstrap](https://books.google.com/books?id=MWC1DwAAQBAJ&printsec=frontcoverhttps://books.google.com/books?id=MWC1DwAAQBAJ&printsec=frontcover). A new medical treatment is intended to prolong life after a form of surgery. Sixteen mice are randomly assigned to either a treatment group or control group under the constraint that only seven treatments are available. All mice receive the surgery, but only the treatment group will receive the treatment being studied. The survival time of each mouse after surgery is recorded below."
1111
]
1212
},
1313
{
@@ -28,7 +28,7 @@
2828
"id": "e19f78a9-e1c0-4d8a-aeb6-3fa944752948",
2929
"metadata": {},
3030
"source": [
31-
"The difference in the mean life after treatment between the two groups suggests that the treatment has a prolonging effect, as hypothesized."
31+
"The difference in mean lifetime after treatment between the two groups suggests that the treatment has a prolonging effect, as hypothesized."
3232
]
3333
},
3434
{
@@ -88,10 +88,18 @@
8888
"source": [
8989
"The probability of observing such an extreme test statistic under the null hypothesis (due to chance alone) is greater than 14%, so these data do not seem inconsistent with the null hypothesis. The *point estimate* of the statistic (~30 days) suggested a life-prolonging effect, but such a value of the statistic could quite easily have been observed due to chance alone.\n",
9090
"\n",
91-
"Although the t-test tends to be rather robust to violations of its underlying assumptions (e.g., $X$ and $Y$ do not need to be strictly normally distributed for the test to be reasonably accurate), it is possible to perform a hypothesis test which requires no such assumptions at all. \n",
91+
"Although the t-test tends to be rather robust to violations of its underlying assumptions (e.g., $X$ and $Y$ do not need to be strictly normally distributed for the test to be reasonably accurate), it is possible to perform a hypothesis test which requires almost no such assumptions at all. \n",
9292
"\n",
93-
"Instead, let the null hypothesis be that the samples `x` and `y` are drawn a single distribution ($X = Y = Z$), and test this against the alternative that the two sample are drawn from distributions which would tend to produce greater values of `statistic`. \n",
93+
"Instead, let the null hypothesis be that the observations from the samples `x` and `y` were all drawn independently<a name=\"cite_ref-2\"></a>[<sup>**†**</sup>](#cite_note-2) from a single distribution ($X = Y = Z$), and test this against the alternative that the two samples were drawn from distinct distributions that would tend to produce a greater value of `statistic` (in this case such that $\\mu_x > \\mu_y$).\n",
9494
"\n",
95+
"<a name=\"cite_ref-2\"></a>[<sup>**†**</sup>](#cite_note-2) Actually, only exchangeability is required [[4]](https://en.wikipedia.org/wiki/Exchangeable_random_variables)."
96+
]
97+
},
98+
{
99+
"cell_type": "markdown",
100+
"id": "96079705",
101+
"metadata": {},
102+
"source": [
95103
"The complete population of mice survival times in the study is really:"
96104
]
97105
},
@@ -119,11 +127,11 @@
119127
"id": "6204af8f-b858-4d5a-8bed-b4ef1bc00c30",
120128
"metadata": {},
121129
"source": [
122-
"Since the mice were randomly divided into the two groups under the constraint that there were only seven treatments available, any selection of seven mice from `z` to form the treatment group `x` was equally likely; the remaining mice would form the control group `y`. Furthermore, if the null hypothesis is true, the mice survival times would be *unaffected by the grouping*. Therefore, each value of the statistic obtained from the possible groupings is equaly likely.\n",
130+
"Since the mice were randomly divided into the two groups under the constraint that there were only seven treatments available, any selection of seven mice from `z` to form the treatment group `x` was equally likely; the remaining mice would form the control group `y`. Furthermore, if the null hypothesis is true, the mice survival times would be *unaffected by the grouping*. Therefore, each value of the statistic obtained from the possible groupings is equally likely.\n",
123131
"\n",
124-
"We begin our hypothesis test by calculating the value of `statistic` for all possible *permutations*<a name=\"cite_ref-2\"></a>[<sup>[2]</sup>](#cite_note-2) of mice into the the two groups, forming an exact null distribution.\n",
132+
"We begin our hypothesis test by calculating the value of `statistic` for all possible *permutations*<a name=\"cite_ref-3\"> </a>[<sup>**‡**</sup>](#cite_note-3) of mice into the the two groups, forming an exact null distribution.\n",
125133
"\n",
126-
"<a name=\"cite_ref-2\"></a>[<sup>[2]</sup>](#cite_note-2) Here and below, we will refer to the the ways of rearranging samples as \"permutations\" even when the word is not stricly appropriate in the technical sense. "
134+
"<a name=\"cite_ref-3\"></a>[<sup>**‡**</sup>](#cite_note-3) Here and below, we will refer to the the ways of rearranging samples as \"permutations\" even when the word is not stricly appropriate in the technical sense. "
127135
]
128136
},
129137
{
@@ -261,7 +269,7 @@
261269
"id": "ae116861-ded9-4728-a1b1-4c9c34c50fdd",
262270
"metadata": {},
263271
"source": [
264-
"Note that the exact $p$-value from the permutation test matches the $p$-value from the t-test quite closely. (As we shall see, Ronald Fisher introduced permutation tests primarily to support the use of the t-test in applications where the underlying normality assumptions were not strictly true [[4](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2458144/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2458144/)].)"
272+
"Note that the exact $p$-value from the permutation test matches the $p$-value from the t-test quite closely. (As we shall see, Ronald Fisher introduced permutation tests primarily to support the use of the t-test in applications where the underlying normality assumptions were not strictly true [[5]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2458144/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2458144/).)"
265273
]
266274
},
267275
{
@@ -350,7 +358,7 @@
350358
"id": "459d7ca3-f719-4b8e-a855-8e47846c3ec0",
351359
"metadata": {},
352360
"source": [
353-
"Note that `1` is added to both the numerator and denominator when performing the randomized test [[3]](https://www.degruyter.com/document/doi/10.2202/1544-6115.1585/html). This can be thought of as including the observed value of the test statistic in the null distribution, and it ensures that the $p$-value of a randomized test is never zero."
361+
"Note that `1` is added to both the numerator and denominator when performing the randomized test [[6]](https://www.degruyter.com/document/doi/10.2202/1544-6115.1585/html). This can be thought of as including the observed value of the test statistic in the null distribution, and it ensures that the $p$-value of a randomized test is never zero."
354362
]
355363
},
356364
{

ipython/ResamplingAndMonteCarloMethods/resampling_tutorial_2a.ipynb

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,9 @@
2727
"id": "8525c260-6699-4f90-b20c-85e0d948bb3a",
2828
"metadata": {},
2929
"source": [
30-
"In the paper, Wilcoxon describes a test to assess whether the two samples are drawn from the same population that is now commonly described as a *nonparametric* version of the independent sample t-test - that is, a version of the t-test that does not make the normality (or any particular distributional) assumption. \n",
30+
"In the paper, Wilcoxon describes a test to assess whether the two samples are drawn from the same population that is now commonly described as a *nonparametric* version of the independent sample t-test - that is, a version of the t-test that does not make a normality (or any particular distributional) assumption. \n",
3131
"\n",
32-
"Suppose we want to test that null hypothesis that the samples are drawn from the same distribution against the alternative that they are drawn from different distributions which tend to produce samples with a lower values of the statistic. Under certain assumptions, this can be argued as evidence that the location of the distribution underlying `x` is less than the location of the distribution underlying `y`. We pass the data into [`scipy.stats.mannwhitneyu`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html) with `alternative='less'`."
32+
"Given samples `x` and `y`, Wilcoxon introduces a statistic which is proportional to an empirical estimate of the probability that a random observation from the distribution underlying `x` will be less than a random observation from the distribution underlying `y`. Suppose we want to test the null hypothesis that the samples are drawn from the same distribution against the alternative that they are drawn from different distributions which tend to produce samples that give lower values of the statistic. Under certain assumptions, this can be argued as evidence that the location of the distribution underlying `x` is less than the location of the distribution underlying `y`. To perform this test, we pass the data into [`scipy.stats.mannwhitneyu`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html) with `alternative='less'`."
3333
]
3434
},
3535
{
@@ -60,7 +60,7 @@
6060
"id": "4b709400-3601-4ce5-9a36-538498abab45",
6161
"metadata": {},
6262
"source": [
63-
"Like the mean comparison test in Efron's example, this is an example of an \"independent sample\" test of the null hypothesis that group labels (`x`, `y`) are entirely random. In fact, because `mannwhitneyu` claims to produce an exact value of the statistic, we would expect `permutation_test` to return precisely the same $p$-value (using `mannwhitneyu` only to compute the statistic)."
63+
"Like the mean comparison test in Efron's example from the previous tutorial on [Permutation Tests](https://nbviewer.org/github/scipy/scipy-cookbook/blob/main/ipython/ResamplingAndMonteCarloMethods/resampling_tutorial_2.ipynb), this is an example of an \"independent sample\" test of the null hypothesis that group labels (`x`, `y`) are entirely random. In fact, because `mannwhitneyu` claims to produce an exact $p$-value, we would expect `permutation_test` to return precisely the same $p$-value (using `mannwhitneyu` only to compute the statistic)."
6464
]
6565
},
6666
{
@@ -179,7 +179,7 @@
179179
"metadata": {},
180180
"source": [
181181
"#### Multi-sample Test\n",
182-
"`scipy.stats.kruskal` is a many-sample extension of the Mann-Whitney U test, but SciPy provides only an approximate (asymptotic) $p$-value. It is possible to perform an exact version of the test using `permutation_test` very small samples, and a randomized test using a subset of the possible permutations may yield more accurate results than the approximation implemented by `kruskal`, especially if there are ties or the sample size is small. Using the (artificial) data for milk cap production from [Kruskal and Wallis' original paper](https://www.tandfonline.com/doi/abs/10.1080/01621459.1952.10483441), we have:"
182+
"`scipy.stats.kruskal` is a many-sample extension of the Mann-Whitney U test, but SciPy provides only an approximate (asymptotic) $p$-value. It is possible to perform an exact version of the test using `permutation_test` for very small samples, and a randomized test using a subset of the possible permutations may yield more accurate results than the approximation implemented by `kruskal`, especially if there are ties or the sample size is small. Using the (artificial) data for milk cap production from [Kruskal and Wallis' original paper](https://www.tandfonline.com/doi/abs/10.1080/01621459.1952.10483441), we have:"
183183
]
184184
},
185185
{
@@ -244,7 +244,7 @@
244244
"id": "fc877591-9f75-4990-a035-a6a48b4e4597",
245245
"metadata": {},
246246
"source": [
247-
"Note that we passed `alternative='greater'` into `permutation_test` but not into `kruskal`. This is because the `kruskal` statistic is inherently one-sided test: data generated under the null hypothesis tends to generate small positive values, and data with greater values are more exceptional. This raises the point that setting up a permutation test requires some study of the underlying statistic and SciPy's implementation. Another example of this is shown in the next section."
247+
"Note that we passed `alternative='greater'` into `permutation_test` but not into `kruskal`. This is because the `kruskal` test is inherently one-sided: data generated under the null hypothesis tends to generate small positive values of the statistic with greater values always being more exceptional. This raises the point that setting up a permutation test requires some study of both the underlying statistic and SciPy's implementation. Another example of this is shown in the next section."
248248
]
249249
},
250250
{
@@ -493,7 +493,7 @@
493493
"id": "c9cb3241-fea3-4b3d-90af-a60a7852ffbd",
494494
"metadata": {},
495495
"source": [
496-
"This was much faster, but something is still wrong. Either the approximate $p$-value is wildly inaccurate, or we have set up our test incorrectly. The latter turns out to be the case: the value of `alternative` passed into `ks_2samp` changes *the definition of the test statistic*, but a *greater* p-value is always considered more extreme. Therefore, even if we wish to perform a test equivalent to `ks_2samp` with `alternative='less'`, we actually need to pass `alternative='greater'` into `permutation_test`!"
496+
"This was much faster, but something is still wrong. Either the approximate $p$-value is wildly inaccurate, or we have set up our test incorrectly. The latter turns out to be the case: the value of `alternative` passed into `ks_2samp` changes *the definition of the test statistic*, but a *greater* statistic is always considered more extreme. Therefore, even if we wish to perform a test equivalent to `ks_2samp` with `alternative='less'`, we actually need to pass `alternative='greater'` into `permutation_test`!"
497497
]
498498
},
499499
{

0 commit comments

Comments
 (0)