-
Notifications
You must be signed in to change notification settings - Fork 12
[DOC] Section 1 of user guide/definition of concepts #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #408 +/- ##
=======================================
Coverage 98.37% 98.37%
=======================================
Files 23 23
Lines 1602 1602
=======================================
Hits 1576 1576
Misses 26 26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be useful to have in this section a typology of all VI methods.
man-shu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
Just wondering whether we should introduce the Total Sobol Index in the "Types of VI methods" section or some other place. The original issue #306 mentions it...
docs/src/concepts.rst
Outdated
| There are two main types of VI methods implemented in HiDimStat: | ||
|
|
||
| 1. Marginal methods: these methods provide importance to all the features | ||
| that are related to the output, even if it is caused by spurius correlation. They |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| that are related to the output, even if it is caused by spurius correlation. They | |
| that are related to the output, even if it is caused by spurious correlation. They |
docs/src/concepts.rst
Outdated
| 1. Marginal methods: these methods provide importance to all the features | ||
| that are related to the output, even if it is caused by spurius correlation. They | ||
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. | ||
| Example of such methods is LOCI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to provide a reference for LOCI, or at least expand the abbreviation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would also suggest the reference but I think they are not yet available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LOCI, I find this reference: Ewald, Fiona Katharina, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, and Gunnar König. "A guide to feature importance methods for scientific inference." In World Conference on Explainable Artificial Intelligence, pp. 440-464. Cham: Springer Nature Switzerland, 2024.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant it was the reference to the implemented class, not a bibliography reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biblio ref should be good enough for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference for the implementation should be only in the docstring of the class. In this case, we can keep a more general bibliography.
docs/src/concepts.rst
Outdated
| 1. Marginal methods: these methods provide importance to all the features | ||
| that are related to the output, even if it is caused by spurius correlation. They | ||
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. | ||
| Example of such methods is LOCI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Example of such methods is LOCI. | |
| An example of such a method is LOCI. |
docs/src/concepts.rst
Outdated
| i.e., they contribute unique knowledge. They are related with Conditional | ||
| Independence Testing, which consist in testing if | ||
| :math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are | ||
| :class:`hidimstat.LOCO` and :class:`hidimstat.CFI`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| i.e., they contribute unique knowledge. They are related with Conditional | |
| Independence Testing, which consist in testing if | |
| :math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are | |
| :class:`hidimstat.LOCO` and :class:`hidimstat.CFI`. | |
| i.e., they contribute unique knowledge. They are related to Conditional | |
| Independence Testing, which consists of testing whether | |
| :math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are | |
| :class:`hidimstat.LOCO` and :class:`hidimstat.CFI`. |
docs/src/concepts.rst
Outdated
| soon). | ||
|
|
||
| Variable Selection | ||
| ------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ------------------------------- | |
| ------------------ |
docs/src/concepts.rst
Outdated
|
|
||
|
|
||
| High-dimension and correlation | ||
| ----------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ----------------------------------- | |
| ------------------------------ |
docs/src/concepts.rst
Outdated
| that are related to the output, even if it is caused by spurius correlation. They | ||
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| that are related to the output, even if it is caused by spurius correlation. They | |
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. | |
| that are related to the output, even if it is caused by spurius correlation. They | |
| consist of testing whether :math:`X^j\perp\!\!\!\!\perp Y`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that sounds better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because they do not directly test whether X is independent of Y because they are variable importance measures, not just for selection. That is why I would say that implicitly they are related to this testing, but they do not consist on this testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense!
| statistical control to the discoveries made. Simply selecting the most important | ||
| features without such control is not valid. Different forms of guarantees can | ||
| be employed, such as controlling the type-I error or the False Discovery Rate. | ||
| This step is directly related to the task of Variable Selection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be very wrong, but isn't this section somewhat redundant to the Variable Selection section? Could it be incorporated with the Variable Selection section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I am not sure how. Indeed it is important to make explicit that the power of the library is to provide statistical guarantees too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply add a cross-link ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will add sections to describe variable importance concepts (TSI) and variable selection concepts (FWER, FDR, etc.) in the Definition of concepts of the API, see #549
docs/src/concepts.rst
Outdated
| It allow us to rank the variables from more to less important. | ||
|
|
||
| Here, ``VI`` can be a variable importance method implemented in HiDimStat, | ||
| such as :class:`hidimstat.LOCO` (other methods will support the same API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can use the full name of the model before to introduce the acronym of it, it will be better.
…stat into userguide_section1
docs/src/concepts.rst
Outdated
|
|
||
| There are two main types of VI methods implemented in HiDimStat: | ||
|
|
||
| 1. Marginal methods: these methods provide importance to all the features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. Marginal methods: these methods provide importance to all the features | |
| 1. **Marginal methods**: these methods provide importance to all the features |
docs/src/concepts.rst
Outdated
| An example of such methods is Leave One Covariate In (LOCI, | ||
| :footcite:p:`ewald_2024`). | ||
|
|
||
| 2. Conditional methods: these methods assign importance only to features that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 2. Conditional methods: these methods assign importance only to features that | |
| 2. **Conditional methods**: these methods assign importance only to features that |
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have minor comments. LGTM overall.
| ------------------- | ||
|
|
||
| Global Variable Importance (VI) aims to assign a measure of | ||
| relevance to each feature :math:`X^j` with respect to a target :math:`Y` in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| relevance to each feature :math:`X^j` with respect to a target :math:`Y` in the | |
| relevance to each feature :math:`X^j` with respect to a target :math:`y` in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will explain that in #546
| statistical control to the discoveries made. Simply selecting the most important | ||
| features without such control is not valid. Different forms of guarantees can | ||
| be employed, such as controlling the type-I error or the False Discovery Rate. | ||
| This step is directly related to the task of Variable Selection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply add a cross-link ?
* init definition of concepts * point to specific VI classes * only d0crt works rn * section on types of variable imp methods * minor * definition * Statistical Inference and concept description * Add explicit information about the gap between variable importance and selection * add ewald bibliography * Typo Knockoff * Notation and small corrections on selection * y as sklearn * Variable selection title --------- Co-authored-by: angelReyero <angelreyerolobo@gmail.com> Co-authored-by: lionel kusch <lionel.a.kusch@inria.fr> Co-authored-by: jpaillard <joseph.paillard@inria.fr>
* add a changelog file * add file for contributors * add changelogtemplate * a what is new for listing the change * update pyproject * add a swithcer of version * include lastest modification * fix a bug * documenation on how to make a release * move build packages Need to be check f this script is still usefull * move file * add contributors * update version * Fix codespell * remove build from isort * fix docstring * update version number * update license declaration * Fix readme file * fix readme for release * update how to make release * rename version * Update release * avoid test\n[skip tests] * fix documentation [skip tests] * update release info * fix management of lastest version * Update pyproject.toml Co-authored-by: bthirion <bertrand.thirion@inria.fr> * Update tools/release/How_to_release.md Co-authored-by: bthirion <bertrand.thirion@inria.fr> * Update tools/release/How_to_release.md Co-authored-by: bthirion <bertrand.thirion@inria.fr> * Update tools/release/How_to_release.md Co-authored-by: bthirion <bertrand.thirion@inria.fr> * Update tools/release/How_to_release.md Co-authored-by: bthirion <bertrand.thirion@inria.fr> * fix * Update tools/release/How_to_release.md Co-authored-by: bthirion <bertrand.thirion@inria.fr> * [skip tests] * fix documentation * update realse notes * fix name of branches for release * Update CHANGELOG.rst Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr> * Update CHANGELOG.rst Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr> * Update CHANGELOG.rst Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr> * update contributor file * [DOC] Section 1 of user guide/definition of concepts (#408) * init definition of concepts * point to specific VI classes * only d0crt works rn * section on types of variable imp methods * minor * definition * Statistical Inference and concept description * Add explicit information about the gap between variable importance and selection * add ewald bibliography * Typo Knockoff * Notation and small corrections on selection * y as sklearn * Variable selection title --------- Co-authored-by: angelReyero <angelreyerolobo@gmail.com> Co-authored-by: lionel kusch <lionel.a.kusch@inria.fr> Co-authored-by: jpaillard <joseph.paillard@inria.fr> * release 0.3.0 --------- Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr> Co-authored-by: bthirion <bertrand.thirion@inria.fr> Co-authored-by: Himanshu Aggarwal <himanshuaggarwal1997@gmail.com> Co-authored-by: angelReyero <angelreyerolobo@gmail.com>
Relates to #306. With @AngelReyero.
For section 1 of the user guide, which contains the definition of all basic concepts.