Skip to content

Conversation

@shaurya5
Copy link

@shaurya5 shaurya5 commented Oct 3, 2025


ser2 = Series([1 + 2j, 2 + 3j, 3 + 4j], dtype=np.complex128)
expected_var = 2.0
tm.assert_almost_equal(ser2.var(ddof=1), expected_var)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need assert_almost_equal or can we use assert_series_equal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need assert_almost_equal because .var() will return a scalar not a series

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you instead adjust the expected value to be the right type of output? The point of assert_almost_equal is to allow for differences in precision, but not necessarily in types

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series.var() returns a scalar (not a Series), so tm.assert_almost_equal() is the appropriate assertion function here.
The expected value is already the correct type - it's a scalar float (2.0 or 4/3), which matches the scalar output from .var().

Is there something that I am missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error are you getting with assert_series_equal?

Copy link
Author

@shaurya5 shaurya5 Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I use assert_series_equal, I get this error:
AssertionError: Series Expected type <class 'pandas.core.series.Series'>, found <class 'numpy.float64'> instead

I am still unable to figure out how assert_series_equal can be used here. Do you want me to convert the expected variance value and the Series.var() output (both scalar values) to a Series and then use assert_series_equal?

If so, that approach would look like:
result = Series([ser.var(ddof=ddof)])
expected_series = Series([expected])
tm.assert_series_equal(result, expected_series, rtol=1e-5, atol=1e-8)

However, I'd need to explicitly pass rtol and atol because assert_series_equal doesn't have default tolerance parameters, which means it would fail on floating-point precision differences across different configurations. In contrast, tm.assert_almost_equal() has built-in tolerance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK - so it is just returning a scalar? In that case, just use the equality semantics of the type of the scalar - no need for the series comparison functions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's necessary here due to floating-point precision issues.

During local testing, I encountered failures where the computed variance was 2.0..001 instead of exactly 2.0 due to floating-point rounding errors. This is architecture and optimization-level dependent. Using direct equality (assert result == expected) would make the test brittle across different CPU archs.

tm.assert_almost_equal() provides the tolerance needed to handle these unavoidable precision differences while still validating correctness. This is consistent with how pandas tests other floating-point operations.

Would it be acceptable to keep tm.assert_almost_equal() for this reason?

ser2.var(ddof=1), np.var([1 + 2j, 2 + 3j, 3 + 4j], ddof=1)
)

# Test with NaN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating multiple variables it would be better to parametrize the inputs to this test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the inputs as parameters

# Test other ddof values
tm.assert_almost_equal(ser2.var(ddof=0), 4 / 3)

# Test that imaginary part is preserved in mean calculation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should be a separate test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this a separate test

@mroeschke mroeschke added the Reduction Operations sum, mean, min, max, etc. label Oct 3, 2025
@shaurya5 shaurya5 force-pushed the sg/issue-62421 branch 2 times, most recently from 890d79b to 0af9ccd Compare October 4, 2025 10:09
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Reduction Operations sum, mean, min, max, etc. Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Incorrect results for pandas.Series.var

3 participants