Fix variance calculation for complex numbers by preserving dtype #62555

shaurya5 · 2025-10-03T10:31:05Z

closes BUG: Incorrect results for pandas.Series.var #62421
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

WillAyd · 2025-10-03T16:27:50Z

pandas/tests/reductions/test_reductions.py


+        ser2 = Series([1 + 2j, 2 + 3j, 3 + 4j], dtype=np.complex128)
+        expected_var = 2.0
+        tm.assert_almost_equal(ser2.var(ddof=1), expected_var)


Do we need assert_almost_equal or can we use assert_series_equal?

We need assert_almost_equal because .var() will return a scalar not a series

Can you instead adjust the expected value to be the right type of output? The point of assert_almost_equal is to allow for differences in precision, but not necessarily in types

Series.var() returns a scalar (not a Series), so tm.assert_almost_equal() is the appropriate assertion function here.
The expected value is already the correct type - it's a scalar float (2.0 or 4/3), which matches the scalar output from .var().

Is there something that I am missing?

What error are you getting with assert_series_equal?

When I use assert_series_equal, I get this error:
AssertionError: Series Expected type <class 'pandas.core.series.Series'>, found <class 'numpy.float64'> instead

I am still unable to figure out how assert_series_equal can be used here. Do you want me to convert the expected variance value and the Series.var() output (both scalar values) to a Series and then use assert_series_equal?

If so, that approach would look like:
result = Series([ser.var(ddof=ddof)])
expected_series = Series([expected])
tm.assert_series_equal(result, expected_series, rtol=1e-5, atol=1e-8)

However, I'd need to explicitly pass rtol and atol because assert_series_equal doesn't have default tolerance parameters, which means it would fail on floating-point precision differences across different configurations. In contrast, tm.assert_almost_equal() has built-in tolerance.

Oh OK - so it is just returning a scalar? In that case, just use the equality semantics of the type of the scalar - no need for the series comparison functions

I believe it's necessary here due to floating-point precision issues.

During local testing, I encountered failures where the computed variance was 2.0..001 instead of exactly 2.0 due to floating-point rounding errors. This is architecture and optimization-level dependent. Using direct equality (assert result == expected) would make the test brittle across different CPU archs.

tm.assert_almost_equal() provides the tolerance needed to handle these unavoidable precision differences while still validating correctness. This is consistent with how pandas tests other floating-point operations.

Would it be acceptable to keep tm.assert_almost_equal() for this reason?

WillAyd · 2025-10-03T16:28:35Z

pandas/tests/reductions/test_reductions.py

+            ser2.var(ddof=1), np.var([1 + 2j, 2 + 3j, 3 + 4j], ddof=1)
+        )
+
+        # Test with NaN


Rather than creating multiple variables it would be better to parametrize the inputs to this test

added the inputs as parameters

WillAyd · 2025-10-03T16:29:26Z

pandas/tests/reductions/test_reductions.py

+        # Test other ddof values
+        tm.assert_almost_equal(ser2.var(ddof=0), 4 / 3)
+
+        # Test that imaginary part is preserved in mean calculation


This looks like it should be a separate test

made this a separate test

github-actions · 2025-11-28T00:08:19Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

shaurya5 force-pushed the sg/issue-62421 branch from 05e6f08 to f42afd3 Compare October 3, 2025 14:10

WillAyd requested changes Oct 3, 2025

View reviewed changes

mroeschke added the Reduction Operations sum, mean, min, max, etc. label Oct 3, 2025

shaurya5 force-pushed the sg/issue-62421 branch 2 times, most recently from 890d79b to 0af9ccd Compare October 4, 2025 10:09

github-actions bot added the Stale label Nov 28, 2025

shaurya5 force-pushed the sg/issue-62421 branch from 116c154 to 908e8d5 Compare November 28, 2025 15:04

Fix variance calculation for complex numbers by preserving dtype

5615f38

shaurya5 force-pushed the sg/issue-62421 branch from 908e8d5 to 5615f38 Compare November 28, 2025 16:52

Uh oh!

Fix variance calculation for complex numbers by preserving dtype #62555

Are you sure you want to change the base?

Fix variance calculation for complex numbers by preserving dtype #62555

Conversation

shaurya5 commented Oct 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shaurya5 Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shaurya5 Oct 21, 2025 •

edited

Loading