Skip to content

Conversation

@man-shu
Copy link
Collaborator

@man-shu man-shu commented Sep 15, 2025

Relates to #306. With @AngelReyero.

For section 1 of the user guide, which contains the definition of all basic concepts.

@man-shu man-shu marked this pull request as draft September 15, 2025 10:53
@codecov
Copy link

codecov bot commented Sep 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.37%. Comparing base (324d31c) to head (c78aff9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #408   +/-   ##
=======================================
  Coverage   98.37%   98.37%           
=======================================
  Files          23       23           
  Lines        1602     1602           
=======================================
  Hits         1576     1576           
  Misses         26       26           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@man-shu man-shu changed the title Section 1 of user guide/definition of concepts [DOC] Section 1 of user guide/definition of concepts Sep 15, 2025
Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be useful to have in this section a typology of all VI methods.

Copy link
Collaborator Author

@man-shu man-shu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

Just wondering whether we should introduce the Total Sobol Index in the "Types of VI methods" section or some other place. The original issue #306 mentions it...

There are two main types of VI methods implemented in HiDimStat:

1. Marginal methods: these methods provide importance to all the features
that are related to the output, even if it is caused by spurius correlation. They
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that are related to the output, even if it is caused by spurius correlation. They
that are related to the output, even if it is caused by spurious correlation. They

1. Marginal methods: these methods provide importance to all the features
that are related to the output, even if it is caused by spurius correlation. They
are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`.
Example of such methods is LOCI.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to provide a reference for LOCI, or at least expand the abbreviation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would also suggest the reference but I think they are not yet available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LOCI, I find this reference: Ewald, Fiona Katharina, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, and Gunnar König. "A guide to feature importance methods for scientific inference." In World Conference on Explainable Artificial Intelligence, pp. 440-464. Cham: Springer Nature Switzerland, 2024.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant it was the reference to the implemented class, not a bibliography reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biblio ref should be good enough for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference for the implementation should be only in the docstring of the class. In this case, we can keep a more general bibliography.

1. Marginal methods: these methods provide importance to all the features
that are related to the output, even if it is caused by spurius correlation. They
are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`.
Example of such methods is LOCI.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example of such methods is LOCI.
An example of such a method is LOCI.

Comment on lines 73 to 76
i.e., they contribute unique knowledge. They are related with Conditional
Independence Testing, which consist in testing if
:math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are
:class:`hidimstat.LOCO` and :class:`hidimstat.CFI`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i.e., they contribute unique knowledge. They are related with Conditional
Independence Testing, which consist in testing if
:math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are
:class:`hidimstat.LOCO` and :class:`hidimstat.CFI`.
i.e., they contribute unique knowledge. They are related to Conditional
Independence Testing, which consists of testing whether
:math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are
:class:`hidimstat.LOCO` and :class:`hidimstat.CFI`.

soon).

Variable Selection
-------------------------------
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-------------------------------
------------------



High-dimension and correlation
-----------------------------------
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-----------------------------------
------------------------------

Comment on lines 67 to 68
that are related to the output, even if it is caused by spurius correlation. They
are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that are related to the output, even if it is caused by spurius correlation. They
are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`.
that are related to the output, even if it is caused by spurius correlation. They
consist of testing whether :math:`X^j\perp\!\!\!\!\perp Y`.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that sounds better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because they do not directly test whether X is independent of Y because they are variable importance measures, not just for selection. That is why I would say that implicitly they are related to this testing, but they do not consist on this testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense!

statistical control to the discoveries made. Simply selecting the most important
features without such control is not valid. Different forms of guarantees can
be employed, such as controlling the type-I error or the False Discovery Rate.
This step is directly related to the task of Variable Selection.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be very wrong, but isn't this section somewhat redundant to the Variable Selection section? Could it be incorporated with the Variable Selection section?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I am not sure how. Indeed it is important to make explicit that the power of the library is to provide statistical guarantees too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply add a cross-link ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will add sections to describe variable importance concepts (TSI) and variable selection concepts (FWER, FDR, etc.) in the Definition of concepts of the API, see #549

It allow us to rank the variables from more to less important.

Here, ``VI`` can be a variable importance method implemented in HiDimStat,
such as :class:`hidimstat.LOCO` (other methods will support the same API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can use the full name of the model before to introduce the acronym of it, it will be better.

@jpaillard jpaillard marked this pull request as ready for review December 8, 2025 10:18

There are two main types of VI methods implemented in HiDimStat:

1. Marginal methods: these methods provide importance to all the features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Marginal methods: these methods provide importance to all the features
1. **Marginal methods**: these methods provide importance to all the features

An example of such methods is Leave One Covariate In (LOCI,
:footcite:p:`ewald_2024`).

2. Conditional methods: these methods assign importance only to features that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Conditional methods: these methods assign importance only to features that
2. **Conditional methods**: these methods assign importance only to features that

Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have minor comments. LGTM overall.

-------------------

Global Variable Importance (VI) aims to assign a measure of
relevance to each feature :math:`X^j` with respect to a target :math:`Y` in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
relevance to each feature :math:`X^j` with respect to a target :math:`Y` in the
relevance to each feature :math:`X^j` with respect to a target :math:`y` in the

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will explain that in #546

statistical control to the discoveries made. Simply selecting the most important
features without such control is not valid. Different forms of guarantees can
be employed, such as controlling the type-I error or the False Discovery Rate.
This step is directly related to the task of Variable Selection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply add a cross-link ?

@jpaillard jpaillard merged commit 1882f60 into mind-inria:main Dec 8, 2025
6 of 7 checks passed
lionelkusch added a commit to lionelkusch/hidimstat that referenced this pull request Dec 8, 2025
* init definition of concepts

* point to specific VI classes

* only d0crt works rn

* section on types of variable imp methods

* minor

* definition

* Statistical Inference and concept description

* Add explicit information about the gap between variable importance and selection

* add ewald bibliography

* Typo Knockoff

* Notation and small corrections on selection

* y as sklearn

* Variable selection title

---------

Co-authored-by: angelReyero <angelreyerolobo@gmail.com>
Co-authored-by: lionel kusch <lionel.a.kusch@inria.fr>
Co-authored-by: jpaillard <joseph.paillard@inria.fr>
lionelkusch added a commit that referenced this pull request Dec 8, 2025
* add a changelog file

* add file for contributors

* add changelogtemplate

* a what is new for listing the change

* update pyproject

* add a swithcer of version

* include lastest modification

* fix a bug

* documenation on how to make a release

* move build packages

Need to be check f this script is still usefull

* move file

* add contributors

* update version

* Fix codespell

* remove build from isort

* fix docstring

* update version number

* update license declaration

* Fix readme file

* fix readme for release

* update how to make release

* rename version

* Update release

* avoid test\n[skip tests]

* fix documentation

[skip tests]

* update release info

* fix management of lastest version

* Update pyproject.toml

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

* Update tools/release/How_to_release.md

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

* Update tools/release/How_to_release.md

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

* Update tools/release/How_to_release.md

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

* Update tools/release/How_to_release.md

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

* fix

* Update tools/release/How_to_release.md

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

* [skip tests]

* fix documentation

* update realse notes

* fix name of branches for release

* Update CHANGELOG.rst

Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr>

* Update CHANGELOG.rst

Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr>

* Update CHANGELOG.rst

Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr>

* update contributor file

* [DOC] Section 1 of user guide/definition of concepts (#408)

* init definition of concepts

* point to specific VI classes

* only d0crt works rn

* section on types of variable imp methods

* minor

* definition

* Statistical Inference and concept description

* Add explicit information about the gap between variable importance and selection

* add ewald bibliography

* Typo Knockoff

* Notation and small corrections on selection

* y as sklearn

* Variable selection title

---------

Co-authored-by: angelReyero <angelreyerolobo@gmail.com>
Co-authored-by: lionel kusch <lionel.a.kusch@inria.fr>
Co-authored-by: jpaillard <joseph.paillard@inria.fr>

* release 0.3.0

---------

Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr>
Co-authored-by: bthirion <bertrand.thirion@inria.fr>
Co-authored-by: Himanshu Aggarwal <himanshuaggarwal1997@gmail.com>
Co-authored-by: angelReyero <angelreyerolobo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants