Skip to content

Commit e64099d

Browse files
committed
run did sim with updated treatment assignment
1 parent 7a5032e commit e64099d

25 files changed

+469
-541
lines changed

doc/did/did_cs_multi.qmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ init_notebook_mode(all_interactive=True)
2424

2525
## Coverage
2626

27-
The simulations are based on the [make_did_cs_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_cs_CS2021.html)-DGP with $2000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
27+
The simulations are based on the [make_did_cs_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_cs_CS2021.html)-DGP with $1000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
2828

2929
- Type 1: Linear outcome model and treatment assignment
3030
- Type 4: Nonlinear outcome model and treatment assignment
@@ -52,7 +52,7 @@ df = pd.read_csv("../../results/did/did_cs_multi_detailed.csv", index_col=None)
5252
assert df["repetition"].nunique() == 1
5353
n_rep = df["repetition"].unique()[0]
5454
55-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
55+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
5656
```
5757

5858
### Observational Score
@@ -127,7 +127,7 @@ df_group = pd.read_csv("../../results/did/did_cs_multi_group.csv", index_col=Non
127127
assert df_group["repetition"].nunique() == 1
128128
n_rep_group = df_group["repetition"].unique()[0]
129129
130-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
130+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
131131
```
132132

133133
#### Observational Score
@@ -195,7 +195,7 @@ df_time = pd.read_csv("../../results/did/did_cs_multi_time.csv", index_col=None)
195195
assert df_time["repetition"].nunique() == 1
196196
n_rep_time = df_time["repetition"].unique()[0]
197197
198-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
198+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
199199
```
200200

201201
#### Observational Score
@@ -263,7 +263,7 @@ df_es = pd.read_csv("../../results/did/did_cs_multi_eventstudy.csv", index_col=N
263263
assert df_es["repetition"].nunique() == 1
264264
n_rep_es = df_es["repetition"].unique()[0]
265265
266-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
266+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
267267
```
268268

269269
#### Observational Score

doc/did/did_pa_multi.qmd

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ init_notebook_mode(all_interactive=True)
2424

2525
## Coverage
2626

27-
The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $2000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
27+
The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $1000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
2828

2929
- Type 1: Linear outcome model and treatment assignment
3030
- Type 4: Nonlinear outcome model and treatment assignment
@@ -52,7 +52,7 @@ df = pd.read_csv("../../results/did/did_pa_multi_detailed.csv", index_col=None)
5252
assert df["repetition"].nunique() == 1
5353
n_rep = df["repetition"].unique()[0]
5454
55-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
55+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
5656
```
5757

5858
### Observational Score
@@ -127,7 +127,7 @@ df_group = pd.read_csv("../../results/did/did_pa_multi_group.csv", index_col=Non
127127
assert df_group["repetition"].nunique() == 1
128128
n_rep_group = df_group["repetition"].unique()[0]
129129
130-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
130+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
131131
```
132132

133133
#### Observational Score
@@ -195,7 +195,7 @@ df_time = pd.read_csv("../../results/did/did_pa_multi_time.csv", index_col=None)
195195
assert df_time["repetition"].nunique() == 1
196196
n_rep_time = df_time["repetition"].unique()[0]
197197
198-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
198+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
199199
```
200200

201201
#### Observational Score
@@ -263,7 +263,7 @@ df_es = pd.read_csv("../../results/did/did_pa_multi_eventstudy.csv", index_col=N
263263
assert df_es["repetition"].nunique() == 1
264264
n_rep_es = df_es["repetition"].unique()[0]
265265
266-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
266+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
267267
```
268268

269269
#### Observational Score
@@ -324,11 +324,9 @@ generate_and_show_styled_table(
324324

325325
## Tuning
326326

327-
The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $2000$ observations. Due to time constraints we only consider one learner, use in-sample normalization and the following DGPs:
327+
The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $1000$ observations. Due to time constraints we only consider one learner, use in-sample normalization and the following DGPs:
328328

329329
- Type 1: Linear outcome model and treatment assignment
330-
- Type 2: Nonlinear outcome model and linear treatment assignment
331-
- Type 3: Linear outcome model and nonlinear treatment assignment
332330
- Type 4: Nonlinear outcome model and treatment assignment
333331

334332
The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidende intervals). This is only an example as the untuned version just relies on the default configuration.
@@ -389,8 +387,6 @@ These simulations test different types of aggregation, as described in [DiD User
389387
As before, we only consider one learner, use in-sample normalization and the following DGPs:
390388

391389
- Type 1: Linear outcome model and treatment assignment
392-
- Type 2: Nonlinear outcome model and linear treatment assignment
393-
- Type 3: Linear outcome model and nonlinear treatment assignment
394390
- Type 4: Nonlinear outcome model and treatment assignment
395391

396392
The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidende intervals). This is only an example as the untuned version just relies on the default configuration.

monte-cover/src/montecover/did/did_cs_multi.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ def run_single_rep(self, dml_data, dml_params) -> Dict[str, Any]:
9696
)
9797
dml_model.fit()
9898
dml_model.bootstrap(n_rep_boot=2000)
99+
nuisance_loss = dml_model.nuisance_loss
99100

100101
# Oracle values for this model
101102
oracle_thetas = np.full_like(dml_model.coef, np.nan)
@@ -143,6 +144,11 @@ def run_single_rep(self, dml_data, dml_params) -> Dict[str, Any]:
143144
"Score": score,
144145
"In-sample-norm.": in_sample_normalization,
145146
"level": level,
147+
"Loss g_d0_t0": nuisance_loss["ml_g_d0_t0"].mean(),
148+
"Loss g_d1_t0": nuisance_loss["ml_g_d1_t0"].mean(),
149+
"Loss g_d0_t1": nuisance_loss["ml_g_d0_t1"].mean(),
150+
"Loss g_d1_t1": nuisance_loss["ml_g_d1_t1"].mean(),
151+
"Loss m": nuisance_loss["ml_m"].mean() if score == "observational" else np.nan,
146152
}
147153
)
148154
for key, res in level_result.items():
@@ -168,6 +174,11 @@ def summarize_results(self):
168174
"Bias": "mean",
169175
"Uniform Coverage": "mean",
170176
"Uniform CI Length": "mean",
177+
"Loss g_d0_t0": "mean",
178+
"Loss g_d1_t0": "mean",
179+
"Loss g_d0_t1": "mean",
180+
"Loss g_d1_t1": "mean",
181+
"Loss m": "mean",
171182
"repetition": "count",
172183
}
173184

monte-cover/src/montecover/did/did_pa_multi.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ def run_single_rep(self, dml_data, dml_params) -> Dict[str, Any]:
9494
)
9595
dml_model.fit()
9696
dml_model.bootstrap(n_rep_boot=2000)
97+
nuisance_loss = dml_model.nuisance_loss
9798

9899
# Oracle values for this model
99100
oracle_thetas = np.full_like(dml_model.coef, np.nan)
@@ -141,6 +142,9 @@ def run_single_rep(self, dml_data, dml_params) -> Dict[str, Any]:
141142
"Score": score,
142143
"In-sample-norm.": in_sample_normalization,
143144
"level": level,
145+
"Loss g_control": nuisance_loss["ml_g0"].mean(),
146+
"Loss g_treated": nuisance_loss["ml_g1"].mean(),
147+
"Loss m": nuisance_loss["ml_m"].mean() if score == "observational" else np.nan,
144148
}
145149
)
146150
for key, res in level_result.items():
@@ -166,6 +170,9 @@ def summarize_results(self):
166170
"Bias": "mean",
167171
"Uniform Coverage": "mean",
168172
"Uniform CI Length": "mean",
173+
"Loss g_control": "mean",
174+
"Loss g_treated": "mean",
175+
"Loss m": "mean",
169176
"repetition": "count",
170177
}
171178

@@ -180,7 +187,7 @@ def summarize_results(self):
180187

181188
def _generate_dml_data(self, dgp_params) -> dml.data.DoubleMLPanelData:
182189
"""Generate data for the simulation."""
183-
data = make_did_CS2021(n_obs=dgp_params["n_obs"], dgp_type=dgp_params["DGP"], xi=dgp_params["xi"])
190+
data = make_did_CS2021(n_obs=dgp_params["n_obs"], dgp_type=dgp_params["DGP"])
184191
dml_data = dml.data.DoubleMLPanelData(
185192
data,
186193
y_col="y",

results/did/did_cs_multi_config.yml

Lines changed: 1 addition & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ dgp_parameters:
99
- 4
1010
- 6
1111
n_obs:
12-
- 2000
12+
- 1000
1313
lambda_t:
1414
- 0.5
1515
learner_definitions:
@@ -19,30 +19,8 @@ learner_definitions:
1919
name: Logistic
2020
lgbmr: &id003
2121
name: LGBM Regr.
22-
params:
23-
n_estimators: 300
24-
learning_rate: 0.03
25-
num_leaves: 7
26-
max_depth: 3
27-
min_child_samples: 20
28-
subsample: 0.8
29-
colsample_bytree: 0.8
30-
reg_alpha: 0.1
31-
reg_lambda: 1.0
32-
random_state: 42
3322
lgbmc: &id004
3423
name: LGBM Clas.
35-
params:
36-
n_estimators: 300
37-
learning_rate: 0.03
38-
num_leaves: 7
39-
max_depth: 3
40-
min_child_samples: 20
41-
subsample: 0.8
42-
colsample_bytree: 0.8
43-
reg_alpha: 0.1
44-
reg_lambda: 1.0
45-
random_state: 42
4624
dml_parameters:
4725
learners:
4826
- ml_g: *id001

0 commit comments

Comments
 (0)