Skip to content

Conversation

@jpaillard
Copy link
Collaborator

@jpaillard jpaillard commented Dec 1, 2025

Changes in the core API

  • Remove the attribute one_minus_pval from BaseVariableImportance, it is now only computed as an intermediate step of fdr_selection when two_tailed_test=True
  • one_minus_pval was used in some examples for FWER control. I also added a method fwer_selection which currently only supports the Bonferroni procedure.
  • The output of both methods is a mask with 0 if non-selected, 1 if selected. If two_tailed_test=True, -1 for negative effects.
  • By default, for BaseVariableImportance, two_tailed_test=False. For DL, CluDL, EnCluDL, this default is overwritten to True

Other changes (most of the diff)

  • update examples using one_minus_pval. Replace with fwer_selection
  • Add tests for coverage of new methods
  • Modify existing tests. To test fdr_selection, repeat multiple draws and test that the empirical fdr is below the target.

@jpaillard jpaillard linked an issue Dec 1, 2025 that may be closed by this pull request
@jpaillard jpaillard changed the title [CORE] remove one_minus_pval [CORE] remove one_minus_pval attribute, add fwer_selection Dec 2, 2025
@jpaillard
Copy link
Collaborator Author

I have been encountering several failures in the tests for DL, CluDL, and EnCluDL. I replaced a single test FDP < FDR with the mean FDP over 10 repeats.

@jpaillard jpaillard marked this pull request as ready for review December 2, 2025 09:04
@jpaillard jpaillard requested a review from bthirion December 2, 2025 09:04
@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.25%. Comparing base (3edd06b) to head (23cf646).

Files with missing lines Patch % Lines
src/hidimstat/base_variable_importance.py 92.59% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #541      +/-   ##
==========================================
- Coverage   98.36%   98.25%   -0.12%     
==========================================
  Files          23       23              
  Lines        1591     1604      +13     
==========================================
+ Hits         1565     1576      +11     
- Misses         26       28       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx ! This looks good. Just provided a few suggestions.

selected_dl = (dl.pvalues_ < (fwer_target / 2) / n_features) | (
dl.one_minus_pvalues_ < (fwer_target / 2) / n_features
)
selected_dl = dl.fwer_selection(fwer=fwer_target / 2, n_tests=n_features)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In examples, you don't want to trick the thresholds. Rather have an explicit two_sided=True and let the method deal with the threshold adaptation.

)


pval_cludl, _, one_minus_pval_cludl, _ = pval_from_two_sided_pval_and_sign(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we avoid the one_minus_pval_cludl here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if alternative_hypothesis is None or alternative_hypothesis:
threshold_one_minus_pvalues = fdr_threshold(
self.one_minus_pvalues_,
one_minus_pval,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we get rid of it ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

procedure : {'bonferroni'}, default='bonferroni'
The FWER control method to use:
- 'bonferroni': Bonferroni correction
n_tests : int or None, default=None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would n_tests differ from len(self.pvalues_) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CluDL, for instance, it is the number of clusters, although the p-values have the same shape as the input.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, you're right.

n_tests = self.importances_.shape[0]
threshold_pvalue = fwer / n_tests
selected = (self.pvalues_ < threshold_pvalue).astype(int)
if two_tailed_test:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we should set threshold_pvalue to fwer / (2 * n_tests)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

target_fdr = 0.1
test_tol = 0.1

for _ in range(500):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, that's a lot !

def test_fwer_selection():
"""
Test that the FWER selection procedure achieves the desired guarantees.
For 500 draws of p-values with 10 important ones the rest drawn from a uniform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's limit it to a small number. 50 ?


assert_almost_equal(importances, beta, decimal=1)
selected = desparsified_lasso.fwer_selection(fwer=alpha, two_tailed_test=False)
fdp, power = fdp_power(np.where(selected)[0], np.where(important)[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you control the FWER, the FDP should be 0 with high probability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get rid of one_minus_pvalues

3 participants