Skip to content

Commit e925e0c

Browse files
authored
Merge pull request #32 from DoubleML/sk-optuna
Add tuning simulations
2 parents 0a2dd00 + 4f425d3 commit e925e0c

File tree

101 files changed

+3355
-1214
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+3355
-1214
lines changed

.github/workflows/apo_sim.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ jobs:
5252
uses: astral-sh/setup-uv@v5
5353
with:
5454
version: "0.7.8"
55-
55+
5656
- name: Set up Python
5757
uses: actions/setup-python@v5
5858
with:

.github/workflows/pliv_sim.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ jobs:
6262
cd monte-cover
6363
uv venv
6464
uv sync
65-
65+
6666
- name: Install DoubleML from correct branch
6767
run: |
6868
source monte-cover/.venv/bin/activate

doc/did/did_cs.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ from utils.style_tables import generate_and_show_styled_table
2222
init_notebook_mode(all_interactive=True)
2323
```
2424

25-
## ATTE Coverage
25+
## Coverage
2626

27-
The simulations are based on the the [make_did_SZ2020](https://docs.doubleml.org/stable/api/generated/doubleml.datasets.make_did_SZ2020.html)-DGP with $1000$ observations. Learners are only set to boosting, due to time constraints (and the nonlinearity of some of the DGPs).
27+
The simulations are based on the the [make_did_SZ2020](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_SZ2020.html)-DGP with $1000$ observations. Learners are only set to boosting, due to time constraints (and the nonlinearity of some of the DGPs).
2828

2929
::: {.callout-note title="Metadata" collapse="true"}
3030

doc/did/did_cs_multi.qmd

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ from utils.style_tables import generate_and_show_styled_table
2222
init_notebook_mode(all_interactive=True)
2323
```
2424

25-
## ATTE Coverage
25+
## Coverage
2626

27-
The simulations are based on the [make_did_cs_CS2021](https://docs.doubleml.org/dev/api/generated/doubleml.did.datasets.make_did_cs_CS2021.html)-DGP with $2000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
27+
The simulations are based on the [make_did_cs_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_cs_CS2021.html)-DGP with $1000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
2828

2929
- Type 1: Linear outcome model and treatment assignment
3030
- Type 4: Nonlinear outcome model and treatment assignment
@@ -52,7 +52,7 @@ df = pd.read_csv("../../results/did/did_cs_multi_detailed.csv", index_col=None)
5252
assert df["repetition"].nunique() == 1
5353
n_rep = df["repetition"].unique()[0]
5454
55-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
55+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
5656
```
5757

5858
### Observational Score
@@ -112,7 +112,7 @@ generate_and_show_styled_table(
112112

113113
## Aggregated Effects
114114

115-
These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/dev/guide/models.html#difference-in-differences-models-did).
115+
These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did).
116116

117117
The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidence intervals).
118118

@@ -127,7 +127,7 @@ df_group = pd.read_csv("../../results/did/did_cs_multi_group.csv", index_col=Non
127127
assert df_group["repetition"].nunique() == 1
128128
n_rep_group = df_group["repetition"].unique()[0]
129129
130-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
130+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
131131
```
132132

133133
#### Observational Score
@@ -195,7 +195,7 @@ df_time = pd.read_csv("../../results/did/did_cs_multi_time.csv", index_col=None)
195195
assert df_time["repetition"].nunique() == 1
196196
n_rep_time = df_time["repetition"].unique()[0]
197197
198-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
198+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
199199
```
200200

201201
#### Observational Score
@@ -263,7 +263,7 @@ df_es = pd.read_csv("../../results/did/did_cs_multi_eventstudy.csv", index_col=N
263263
assert df_es["repetition"].nunique() == 1
264264
n_rep_es = df_es["repetition"].unique()[0]
265265
266-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
266+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
267267
```
268268

269269
#### Observational Score

doc/did/did_pa.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ from utils.style_tables import generate_and_show_styled_table
2222
init_notebook_mode(all_interactive=True)
2323
```
2424

25-
## ATTE Coverage
25+
## Coverage
2626

27-
The simulations are based on the the [make_did_SZ2020](https://docs.doubleml.org/stable/api/generated/doubleml.datasets.make_did_SZ2020.html)-DGP with $1000$ observations. Learners are only set to boosting, due to time constraints (and the nonlinearity of some of the DGPs).
27+
The simulations are based on the the [make_did_SZ2020](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_SZ2020.html)-DGP with $1000$ observations. Learners are only set to boosting, due to time constraints (and the nonlinearity of some of the DGPs).
2828

2929
::: {.callout-note title="Metadata" collapse="true"}
3030

doc/did/did_pa_multi.qmd

Lines changed: 198 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ from utils.style_tables import generate_and_show_styled_table
2222
init_notebook_mode(all_interactive=True)
2323
```
2424

25-
## ATTE Coverage
25+
## Coverage
2626

27-
The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/dev/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $2000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
27+
The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $1000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
2828

2929
- Type 1: Linear outcome model and treatment assignment
3030
- Type 4: Nonlinear outcome model and treatment assignment
@@ -52,7 +52,7 @@ df = pd.read_csv("../../results/did/did_pa_multi_detailed.csv", index_col=None)
5252
assert df["repetition"].nunique() == 1
5353
n_rep = df["repetition"].unique()[0]
5454
55-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
55+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
5656
```
5757

5858
### Observational Score
@@ -112,7 +112,7 @@ generate_and_show_styled_table(
112112

113113
## Aggregated Effects
114114

115-
These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/dev/guide/models.html#difference-in-differences-models-did).
115+
These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did).
116116

117117
The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidende intervals).
118118

@@ -127,7 +127,7 @@ df_group = pd.read_csv("../../results/did/did_pa_multi_group.csv", index_col=Non
127127
assert df_group["repetition"].nunique() == 1
128128
n_rep_group = df_group["repetition"].unique()[0]
129129
130-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
130+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
131131
```
132132

133133
#### Observational Score
@@ -195,7 +195,7 @@ df_time = pd.read_csv("../../results/did/did_pa_multi_time.csv", index_col=None)
195195
assert df_time["repetition"].nunique() == 1
196196
n_rep_time = df_time["repetition"].unique()[0]
197197
198-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
198+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
199199
```
200200

201201
#### Observational Score
@@ -263,7 +263,7 @@ df_es = pd.read_csv("../../results/did/did_pa_multi_eventstudy.csv", index_col=N
263263
assert df_es["repetition"].nunique() == 1
264264
n_rep_es = df_es["repetition"].unique()[0]
265265
266-
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
266+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
267267
```
268268

269269
#### Observational Score
@@ -320,3 +320,194 @@ generate_and_show_styled_table(
320320
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
321321
)
322322
```
323+
324+
325+
## Tuning
326+
327+
The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $1000$ observations. Due to time constraints we only consider one learner, use in-sample normalization and the following DGPs:
328+
329+
- Type 1: Linear outcome model and treatment assignment
330+
- Type 4: Nonlinear outcome model and treatment assignment
331+
332+
The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidende intervals). This is only an example as the untuned version just relies on the default configuration.
333+
334+
::: {.callout-note title="Metadata" collapse="true"}
335+
336+
```{python}
337+
#| echo: false
338+
metadata_file = '../../results/did/did_pa_multi_tune_metadata.csv'
339+
metadata_df = pd.read_csv(metadata_file)
340+
print(metadata_df.T.to_string(header=False))
341+
```
342+
343+
:::
344+
345+
```{python}
346+
#| echo: false
347+
348+
# set up data
349+
df = pd.read_csv("../../results/did/did_pa_multi_tune_detailed.csv", index_col=None)
350+
351+
assert df["repetition"].nunique() == 1
352+
n_rep = df["repetition"].unique()[0]
353+
354+
display_columns = ["Learner g", "Learner m", "DGP", "Tuned", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
355+
```
356+
357+
### Observational Score
358+
359+
```{python}
360+
#| echo: false
361+
generate_and_show_styled_table(
362+
main_df=df,
363+
filters={"level": 0.95, "Score": "observational"},
364+
display_cols=display_columns,
365+
n_rep=n_rep,
366+
level_col="level",
367+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
368+
)
369+
```
370+
371+
```{python}
372+
#| echo: false
373+
generate_and_show_styled_table(
374+
main_df=df,
375+
filters={"level": 0.9, "Score": "observational"},
376+
display_cols=display_columns,
377+
n_rep=n_rep,
378+
level_col="level",
379+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
380+
)
381+
```
382+
383+
## Tuning Aggregated Effects
384+
385+
These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did).
386+
387+
As before, we only consider one learner, use in-sample normalization and the following DGPs:
388+
389+
- Type 1: Linear outcome model and treatment assignment
390+
- Type 4: Nonlinear outcome model and treatment assignment
391+
392+
The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidende intervals). This is only an example as the untuned version just relies on the default configuration.
393+
394+
### Group Effects
395+
396+
```{python}
397+
#| echo: false
398+
399+
# set up data
400+
df_group_tune = pd.read_csv("../../results/did/did_pa_multi_tune_group.csv", index_col=None)
401+
402+
assert df_group_tune["repetition"].nunique() == 1
403+
n_rep_group_tune = df_group_tune["repetition"].unique()[0]
404+
405+
display_columns_tune = ["Learner g", "Learner m", "DGP", "Tuned", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
406+
```
407+
408+
#### Observational Score
409+
410+
```{python}
411+
#| echo: false
412+
generate_and_show_styled_table(
413+
main_df=df_group_tune,
414+
filters={"level": 0.95, "Score": "observational"},
415+
display_cols=display_columns_tune,
416+
n_rep=n_rep_group_tune,
417+
level_col="level",
418+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
419+
)
420+
```
421+
422+
```{python}
423+
#| echo: false
424+
generate_and_show_styled_table(
425+
main_df=df_group_tune,
426+
filters={"level": 0.9, "Score": "observational"},
427+
display_cols=display_columns_tune,
428+
n_rep=n_rep_group_tune,
429+
level_col="level",
430+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
431+
)
432+
```
433+
434+
435+
### Time Effects
436+
437+
```{python}
438+
#| echo: false
439+
440+
# set up data
441+
df_time_tune = pd.read_csv("../../results/did/did_pa_multi_tune_time.csv", index_col=None)
442+
443+
assert df_time_tune["repetition"].nunique() == 1
444+
n_rep_time_tune = df_time_tune["repetition"].unique()[0]
445+
446+
display_columns_tune = ["Learner g", "Learner m", "DGP", "Tuned", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
447+
```
448+
449+
#### Observational Score
450+
451+
```{python}
452+
#| echo: false
453+
generate_and_show_styled_table(
454+
main_df=df_time_tune,
455+
filters={"level": 0.95, "Score": "observational"},
456+
display_cols=display_columns_tune,
457+
n_rep=n_rep_time_tune,
458+
level_col="level",
459+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
460+
)
461+
```
462+
463+
```{python}
464+
#| echo: false
465+
generate_and_show_styled_table(
466+
main_df=df_time_tune,
467+
filters={"level": 0.9, "Score": "observational"},
468+
display_cols=display_columns_tune,
469+
n_rep=n_rep_time_tune,
470+
level_col="level",
471+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
472+
)
473+
```
474+
475+
### Event Study Aggregation
476+
477+
```{python}
478+
#| echo: false
479+
480+
# set up data
481+
df_es_tune = pd.read_csv("../../results/did/did_pa_multi_tune_eventstudy.csv", index_col=None)
482+
483+
assert df_es_tune["repetition"].nunique() == 1
484+
n_rep_es_tune = df_es_tune["repetition"].unique()[0]
485+
486+
display_columns_tune = ["Learner g", "Learner m", "DGP", "Tuned", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
487+
```
488+
489+
#### Observational Score
490+
491+
```{python}
492+
#| echo: false
493+
generate_and_show_styled_table(
494+
main_df=df_es_tune,
495+
filters={"level": 0.95, "Score": "observational"},
496+
display_cols=display_columns_tune,
497+
n_rep=n_rep_es_tune,
498+
level_col="level",
499+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
500+
)
501+
```
502+
503+
```{python}
504+
#| echo: false
505+
generate_and_show_styled_table(
506+
main_df=df_es_tune,
507+
filters={"level": 0.9, "Score": "observational"},
508+
display_cols=display_columns_tune,
509+
n_rep=n_rep_es_tune,
510+
level_col="level",
511+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
512+
)
513+
```

0 commit comments

Comments
 (0)