Skip to content

Commit 19f92cc

Browse files
Edit typos and prettify code in rand_resp.md (#228)
* Edit typos and prettify code in rand_resp.md * Update rand_resp.md * Update rand_resp.md * edit typo * consistent spelling of Monte Carlo
1 parent 132164a commit 19f92cc

File tree

1 file changed

+39
-33
lines changed

1 file changed

+39
-33
lines changed

lectures/rand_resp.md

Lines changed: 39 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,13 @@ These problems induce **selection** biases that present challenges to interpre
2222

2323
To illustrate how social scientists have thought about estimating the prevalence of such embarrassing activities and opinions, this lecture describes a classic approach of S. L. Warner {cite}`warner1965randomized`.
2424

25-
Warner used elementary probability to construct a way to protect the privacy of **individual** respondents to surveys while still estimating the fraction of a **collection** of individuals who have a socially stichmatized characteristic or who engage in a socially stimatized activity.
25+
Warner used elementary probability to construct a way to protect the privacy of **individual** respondents to surveys while still estimating the fraction of a **collection** of individuals who have a socially stigmatized characteristic or who engage in a socially stigmatized activity.
2626

27-
Warner's idea was to add **noise** between the respondent's answer and the **signal** about that answer that the survey taker ultimately receives.
27+
Warner's idea was to add **noise** between the respondent's answer and the **signal** about that answer that the survey maker ultimately receives.
2828

29-
Knowing about the structure of the noise assures the respondent that survey taker does not observe his answer.
29+
Knowing about the structure of the noise assures the respondent that the survey maker does not observe his answer.
3030

31-
Statistical properties of the noise injection procedure provide the a respondent **plausible deniability**.
31+
Statistical properties of the noise injection procedure provide the respondent **plausible deniability**.
3232

3333
Related ideas underlie modern **differential privacy** systems.
3434

@@ -83,7 +83,7 @@ $$
8383
\log(L)= n_1 \log \left[\pi p + (1-\pi)(1-p)\right] + (n-n_{1}) \log \left[(1-\pi) p +\pi (1-p)\right]
8484
$$ (eq:two)
8585
86-
The first-order necessary condition for maximimizng the log likelihood function with respect to $\pi$ is:
86+
The first-order necessary condition for maximizing the log likelihood function with respect to $\pi$ is:
8787
8888
$$
8989
\frac{(n-n_1)(2p-1)}{(1-\pi) p +\pi (1-p)}=\frac{n_1 (2p-1)}{\pi p + (1-\pi)(1-p)}
@@ -197,48 +197,55 @@ under different values of $\pi_A$ and $n$:
197197
198198
```{code-cell} ipython3
199199
class Comparison:
200-
def __init__(self,A,n):
200+
def __init__(self, A, n):
201201
self.A = A
202202
self.n = n
203-
TaTb = np.array([[0.95,1],[0.9,1],[0.7,1],[0.5,1],[1,0.95],[1,0.9],[1,0.7],[1,0.5],[0.95,0.95],[0.9,0.9],[0.7,0.7],[0.5,0.5]])
204-
self.p_arr = np.array([0.6,0.7,0.8,0.9])
205-
self.p_map = dict(zip(self.p_arr,["MSE Ratio: p=" + str(x) for x in self.p_arr]))
206-
self.template = pd.DataFrame(columns = self.p_arr)
203+
TaTb = np.array([[0.95, 1], [0.9, 1], [0.7, 1],
204+
[0.5, 1], [1, 0.95], [1, 0.9],
205+
[1, 0.7], [1, 0.5], [0.95, 0.95],
206+
[0.9, 0.9], [0.7, 0.7], [0.5, 0.5]])
207+
self.p_arr = np.array([0.6, 0.7, 0.8, 0.9])
208+
self.p_map = dict(zip(self.p_arr, [f"MSE Ratio: p = {x}" for x in self.p_arr]))
209+
self.template = pd.DataFrame(columns=self.p_arr)
207210
self.template[['T_a','T_b']] = TaTb
208-
self.template['Bias']=None
211+
self.template['Bias'] = None
209212
210213
def theoretical(self):
214+
A = self.A
215+
n = self.n
211216
df = self.template.copy()
212-
df['Bias']=self.A*(df['T_a']+df['T_b']-2)+(1-df['T_b'])
217+
df['Bias'] = A * (df['T_a'] + df['T_b'] - 2) + (1 - df['T_b'])
213218
for p in self.p_arr:
214-
df[p] = (1 / (16 * (p - 1/2)**2) - (self.A - 1/2)**2)/self.n / \
215-
(df['Bias']**2 + ((self.A * df['T_a'] + (1 - self.A)*(1 - df['T_b']))*(1 - self.A*df['T_a'] - (1 - self.A)*(1 - df['T_b'])) / self.n))
219+
df[p] = (1 / (16 * (p - 1/2)**2) - (A - 1/2)**2) / n / \
220+
(df['Bias']**2 + ((A * df['T_a'] + (1 - A) * (1 - df['T_b'])) * (1 - A * df['T_a'] - (1 - A) * (1 - df['T_b'])) / n))
216221
df[p] = df[p].round(2)
217-
df = df.set_index(["T_a", "T_b","Bias"]).rename(columns=self.p_map)
222+
df = df.set_index(["T_a", "T_b", "Bias"]).rename(columns=self.p_map)
218223
return df
219224
220225
def MCsimulation(self, size=1000, seed=123456):
226+
A = self.A
227+
n = self.n
221228
df = self.template.copy()
222229
np.random.seed(seed)
223-
sample = np.random.rand(size, self.n) <= self.A
224-
random_device = np.random.rand(size, self.n)
230+
sample = np.random.rand(size, self.n) <= A
231+
random_device = np.random.rand(size, n)
225232
mse_rd = {}
226233
for p in self.p_arr:
227234
spinner = random_device <= p
228-
rd_answer = sample*spinner + (1-sample)*(1-spinner)
235+
rd_answer = sample * spinner + (1 - sample) * (1 - spinner)
229236
n1 = rd_answer.sum(axis=1)
230-
pi_hat = (p-1)/(2*p-1) + n1 / self.n / (2*p-1)
231-
mse_rd[p] = np.sum((pi_hat - self.A)**2)
237+
pi_hat = (p - 1) / (2 * p - 1) + n1 / n / (2 * p - 1)
238+
mse_rd[p] = np.sum((pi_hat - A)**2)
232239
for inum, irow in df.iterrows():
233240
truth_a = np.random.rand(size, self.n) <= irow.T_a
234241
truth_b = np.random.rand(size, self.n) <= irow.T_b
235-
trad_answer = sample * truth_a + (1-sample) * (1-truth_b)
236-
pi_trad = trad_answer.sum(axis=1) / self.n
237-
df.loc[inum,'Bias'] = pi_trad.mean() - self.A
238-
mse_trad = np.sum((pi_trad - self.A)**2)
242+
trad_answer = sample * truth_a + (1 - sample) * (1 - truth_b)
243+
pi_trad = trad_answer.sum(axis=1) / n
244+
df.loc[inum, 'Bias'] = pi_trad.mean() - A
245+
mse_trad = np.sum((pi_trad - A)**2)
239246
for p in self.p_arr:
240-
df.loc[inum,p] = (mse_rd[p] / mse_trad).round(2)
241-
df = df.set_index(["T_a", "T_b","Bias"]).rename(columns=self.p_map)
247+
df.loc[inum, p] = (mse_rd[p] / mse_trad).round(2)
248+
df = df.set_index(["T_a", "T_b", "Bias"]).rename(columns=self.p_map)
242249
return df
243250
```
244251
@@ -249,10 +256,10 @@ Let's put the code to work for parameter values
249256
250257
We can generate MSE Ratios theoretically using the above formulas.
251258
252-
We can also perform a Monte-Carlo simulation of the MSE Ratio.
259+
We can also perform a Monte Carlo simulation of the MSE Ratio.
253260
254261
```{code-cell} ipython3
255-
cp1 = Comparison(0.6,1000)
262+
cp1 = Comparison(0.6, 1000)
256263
df1_theoretical = cp1.theoretical()
257264
df1_theoretical
258265
```
@@ -264,7 +271,7 @@ df1_mc
264271
265272
The theoretical calculations do a good job of predicting the Monte Carlo results.
266273
267-
We see that in many situations, especially when the bias is not small, the MSE of the randomized-samplijng methods is smaller than that of the non-randomized sampling method.
274+
We see that in many situations, especially when the bias is not small, the MSE of the randomized-sampling methods is smaller than that of the non-randomized sampling method.
268275
269276
These differences become larger as $p$ increases.
270277
@@ -278,7 +285,7 @@ For example, for another situation described in Warner {cite}`warner1965randomiz
278285
we can use the code
279286
280287
```{code-cell} ipython3
281-
cp2=Comparison(0.5,1000)
288+
cp2 = Comparison(0.5, 1000)
282289
df2_theoretical = cp2.theoretical()
283290
df2_theoretical
284291
```
@@ -296,7 +303,7 @@ We can also revisit a calculation in the concluding section of Warner {cite}`wa
296303
We use the code
297304
298305
```{code-cell} ipython3
299-
cp3=Comparison(0.6,2000)
306+
cp3 = Comparison(0.6, 2000)
300307
df3_theoretical = cp3.theoretical()
301308
df3_theoretical
302309
```
@@ -310,8 +317,7 @@ Evidently, as $n$ increases, the randomized response method does better perform
310317
311318
## Concluding Remarks
312319
313-
{doc}`This quantecon lecture <util_rand_resp>` describes some alternative randomized response surveys.
320+
{doc}`This QuantEcon lecture <util_rand_resp>` describes some alternative randomized response surveys.
314321
315322
That lecture presents the utilitarian analysis of those alternatives conducted by Lars Ljungqvist
316323
{cite}`ljungqvist1993unified`.
317-

0 commit comments

Comments
 (0)