Skip to content

Commit f4d8a93

Browse files
author
Hugo Bowne-Anderson
authored
Add Student NBS 1&2
* Change example of disease prevalence to CTR * Add student NB 1 * Delete 01-Student-Probability_a_simulated_introduction.ipynb * Add student NB1 * Rename 1.Probability_a_simulated_introduction.ipynb to 01-Instructor-Probability_a_simulated_introduction.ipynb * Rename 2.Parameter_estimation_hypothesis_testing.ipynb to 02-Instructor-Parameter_estimation_hypothesis_testing.ipynb * Clear outputs Student NB1 * Add Student NB2
1 parent 1bf4880 commit f4d8a93

4 files changed

+1473
-25
lines changed

notebooks/1.Probability_a_simulated_introduction.ipynb renamed to notebooks/01-Instructor-Probability_a_simulated_introduction.ipynb

Lines changed: 24 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -61,13 +61,10 @@
6161
"source": [
6262
"What type of random phenomena are we talking about here? One example is:\n",
6363
"\n",
64-
"- Knowing that a disease has a 70% prevalence in a population, we can calculate the probabilty of having 10 sick people, 9 sick people, 8 ... and so on, upon drawing 10 people randomly from the population;\n",
65-
"- But given the data of how many sick people we have in a population, how can we calculate the prevalence? And how certain can we be of this prevalence? Or how likely is a particular prevalence?\n",
66-
"\n",
67-
"Science mostly asks questions of the second form above & Bayesian Thinking provides a wondereful framework for answering such questions. Essentially Bayes' Theorem gives us a way of moving from the probability of the data given the model (written as $P(data|model)$) to the probability of the model given the data ($P(model|data)$).\n",
68-
"\n",
69-
"\n",
64+
"- Knowing that a website has a click-through rate (CTR) of 10%, we can calculate the probabilty of having 10 people, 9 people, 8 people ... and so on click through, upon drawing 10 people randomly from the population;\n",
65+
"- But given the data of how many people click through, how can we calculate the CTR? And how certain can we be of this CTR? Or how likely is a particular CTR?\n",
7066
"\n",
67+
"Science mostly asks questions of the second form above & Bayesian thinking provides a wondereful framework for answering such questions. Essentially Bayes' Theorem gives us a way of moving from the probability of the data given the model (written as $P(data|model)$) to the probability of the model given the data ($P(model|data)$).\n",
7168
"\n",
7269
"We'll first explore questions of the 1st type using simulation: knowing the model, what is the probability of seeing certain data?"
7370
]
@@ -76,14 +73,14 @@
7673
"cell_type": "markdown",
7774
"metadata": {},
7875
"source": [
79-
"## 2. Simulating probabilities (Frequentist beginnings)"
76+
"## 2. Simulating probabilities"
8077
]
8178
},
8279
{
8380
"cell_type": "markdown",
8481
"metadata": {},
8582
"source": [
86-
"* Let's say that a disease has a prevalence of 50%, i.e. that 50% of the population have it. If we picked 1000 people at random from this population, how likely would it be to find a certain number of people with the disease?\n",
83+
"* Let's say that a website has a CTR of 50%, i.e. that 50% of people click through. If we picked 1000 people at random from thepopulation, how likely would it be to find that a certain number of people click?\n",
8784
"\n",
8885
"We can simulate this using `numpy`'s random number generator.\n",
8986
"\n",
@@ -105,7 +102,7 @@
105102
"cell_type": "markdown",
106103
"metadata": {},
107104
"source": [
108-
"To then simulate the sampling from the population, we check whether each float was greater or less than 0.5. If less than or equal to 0.5, we say the person is affected."
105+
"To then simulate the sampling from the population, we check whether each float was greater or less than 0.5. If less than or equal to 0.5, we say the person clicked."
109106
]
110107
},
111108
{
@@ -114,17 +111,17 @@
114111
"metadata": {},
115112
"outputs": [],
116113
"source": [
117-
"# Computed now many people are affected\n",
118-
"pop = x <= 0.5\n",
119-
"aff = sum(pop)\n",
120-
"f\"Number of affected people = {aff}\""
114+
"# Computed how many people click\n",
115+
"clicks = x <= 0.5\n",
116+
"n_clicks = sum(pop)\n",
117+
"f\"Number of clicks = {n_clicks}\""
121118
]
122119
},
123120
{
124121
"cell_type": "markdown",
125122
"metadata": {},
126123
"source": [
127-
"The proportion of people affected can be calculated as the total number of affected over the population size:"
124+
"The proportion of people who clicked can be calculated as the total number of clicks over the number of people:"
128125
]
129126
},
130127
{
@@ -133,8 +130,8 @@
133130
"metadata": {},
134131
"outputs": [],
135132
"source": [
136-
"# Computed proportion of those affected\n",
137-
"f\"Proportion affected = {aff/len(pop)}\""
133+
"# Computed proportion of people who clicked\n",
134+
"f\"Proportion who clicked = {n_clicks/len(clicks)}\""
138135
]
139136
},
140137
{
@@ -148,7 +145,7 @@
148145
"cell_type": "markdown",
149146
"metadata": {},
150147
"source": [
151-
"**Up for discussion:** Let's say that all you had was this data and you wanted to figure out the prevalence (probability). \n",
148+
"**Up for discussion:** Let's say that all you had was this data and you wanted to figure out the CTR (probability of clicking). \n",
152149
"\n",
153150
"* What would your estimate be?\n",
154151
"* Bonus points: how confident would you be of your estimate?"
@@ -165,14 +162,14 @@
165162
"cell_type": "markdown",
166163
"metadata": {},
167164
"source": [
168-
"### Hands-on: more prevalent disease"
165+
"### Hands-on: more clicking"
169166
]
170167
},
171168
{
172169
"cell_type": "markdown",
173170
"metadata": {},
174171
"source": [
175-
"Use random sampling to simulate how many people are affected when the prevalence is 0.7. How many are affected? What proportion?"
172+
"Use random sampling to simulate how many people click when the CTR is 0.7. How many click? What proportion?"
176173
]
177174
},
178175
{
@@ -182,17 +179,19 @@
182179
"outputs": [],
183180
"source": [
184181
"# Solution\n",
185-
"pop = x <= 0.7\n",
186-
"aff = sum(pop)\n",
187-
"print(f\"Number of affected people = {aff}\")\n",
188-
"print(f\"Proportion affected = {aff/len(pop)}\")"
182+
"clicks = x <= 0.7\n",
183+
"n_clicks = sum(clicks)\n",
184+
"print(f\"Number of clicks = {n_clicks}\")\n",
185+
"print(f\"Proportion who clicked = {n_clicks/len(clicks)}\")"
189186
]
190187
},
191188
{
192189
"cell_type": "markdown",
193190
"metadata": {},
194191
"source": [
195-
"_Discussion point_: This model is know as the bias coin flip. Can you see why?"
192+
"_Discussion point_: This model is know as the bias coin flip. \n",
193+
"- Can you see why?\n",
194+
"- Can it be used to model other phenomena?"
196195
]
197196
},
198197
{
@@ -830,7 +829,7 @@
830829
"name": "python",
831830
"nbconvert_exporter": "python",
832831
"pygments_lexer": "ipython3",
833-
"version": "3.6.6"
832+
"version": "3.6.1"
834833
}
835834
},
836835
"nbformat": 4,

0 commit comments

Comments
 (0)