-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Fix variance calculation for complex numbers by preserving dtype #62555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
05e6f08 to
f42afd3
Compare
|
|
||
| ser2 = Series([1 + 2j, 2 + 3j, 3 + 4j], dtype=np.complex128) | ||
| expected_var = 2.0 | ||
| tm.assert_almost_equal(ser2.var(ddof=1), expected_var) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need assert_almost_equal or can we use assert_series_equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need assert_almost_equal because .var() will return a scalar not a series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead adjust the expected value to be the right type of output? The point of assert_almost_equal is to allow for differences in precision, but not necessarily in types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Series.var() returns a scalar (not a Series), so tm.assert_almost_equal() is the appropriate assertion function here.
The expected value is already the correct type - it's a scalar float (2.0 or 4/3), which matches the scalar output from .var().
Is there something that I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error are you getting with assert_series_equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I use assert_series_equal, I get this error:
AssertionError: Series Expected type <class 'pandas.core.series.Series'>, found <class 'numpy.float64'> instead
I am still unable to figure out how assert_series_equal can be used here. Do you want me to convert the expected variance value and the Series.var() output (both scalar values) to a Series and then use assert_series_equal?
If so, that approach would look like:
result = Series([ser.var(ddof=ddof)])
expected_series = Series([expected])
tm.assert_series_equal(result, expected_series, rtol=1e-5, atol=1e-8)
However, I'd need to explicitly pass rtol and atol because assert_series_equal doesn't have default tolerance parameters, which means it would fail on floating-point precision differences across different configurations. In contrast, tm.assert_almost_equal() has built-in tolerance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh OK - so it is just returning a scalar? In that case, just use the equality semantics of the type of the scalar - no need for the series comparison functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's necessary here due to floating-point precision issues.
During local testing, I encountered failures where the computed variance was 2.0..001 instead of exactly 2.0 due to floating-point rounding errors. This is architecture and optimization-level dependent. Using direct equality (assert result == expected) would make the test brittle across different CPU archs.
tm.assert_almost_equal() provides the tolerance needed to handle these unavoidable precision differences while still validating correctness. This is consistent with how pandas tests other floating-point operations.
Would it be acceptable to keep tm.assert_almost_equal() for this reason?
| ser2.var(ddof=1), np.var([1 + 2j, 2 + 3j, 3 + 4j], ddof=1) | ||
| ) | ||
|
|
||
| # Test with NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than creating multiple variables it would be better to parametrize the inputs to this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the inputs as parameters
| # Test other ddof values | ||
| tm.assert_almost_equal(ser2.var(ddof=0), 4 / 3) | ||
|
|
||
| # Test that imaginary part is preserved in mean calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should be a separate test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made this a separate test
890d79b to
0af9ccd
Compare
|
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
116c154 to
908e8d5
Compare
908e8d5 to
5615f38
Compare
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.