Skip to content

Commit 2ee9f93

Browse files
Merge branch 'main' into tst/groupby-large-int-sum
2 parents 679fd45 + d4bac86 commit 2ee9f93

File tree

91 files changed

+791
-459
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

91 files changed

+791
-459
lines changed

.github/workflows/unit-tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,7 @@ jobs:
313313
# To freeze this file, uncomment out the ``if: false`` condition, and migrate the jobs
314314
# to the corresponding posix/windows-macos/sdist etc. workflows.
315315
# Feel free to modify this comment as necessary.
316-
if: false
316+
# if: false
317317
defaults:
318318
run:
319319
shell: bash -eou pipefail {0}
@@ -345,7 +345,7 @@ jobs:
345345
- name: Set up Python Dev Version
346346
uses: actions/setup-python@v6
347347
with:
348-
python-version: '3.13-dev'
348+
python-version: '3.14-dev'
349349

350350
- name: Build Environment
351351
run: |

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
| --- | --- |
1212
| Testing | [![CI - Test](https://github.com/pandas-dev/pandas/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/pandas-dev/pandas/actions/workflows/unit-tests.yml) [![Coverage](https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=main)](https://codecov.io/gh/pandas-dev/pandas) |
1313
| Package | [![PyPI Latest Release](https://img.shields.io/pypi/v/pandas.svg)](https://pypi.org/project/pandas/) [![PyPI Downloads](https://img.shields.io/pypi/dm/pandas.svg?label=PyPI%20downloads)](https://pypi.org/project/pandas/) [![Conda Latest Release](https://anaconda.org/conda-forge/pandas/badges/version.svg)](https://anaconda.org/conda-forge/pandas) [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandas.svg?label=Conda%20downloads)](https://anaconda.org/conda-forge/pandas) |
14-
| Meta | [![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3509134.svg)](https://doi.org/10.5281/zenodo.3509134) [![License - BSD 3-Clause](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/main/LICENSE) [![Slack](https://img.shields.io/badge/join_Slack-information-brightgreen.svg?logo=slack)](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack) |
14+
| Meta | [![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3509134.svg)](https://doi.org/10.5281/zenodo.3509134) [![License - BSD 3-Clause](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/main/LICENSE) [![Slack](https://img.shields.io/badge/join_Slack-information-brightgreen.svg?logo=slack)](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack) [![LFX Health Score](https://insights.linuxfoundation.org/api/badge/health-score?project=pandas-dev-pandas)](https://insights.linuxfoundation.org/project/pandas-dev-pandas) |
1515

1616

1717
## What is it?

ci/code_checks.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
7373
-i "pandas.Period.freq GL08" \
7474
-i "pandas.Period.ordinal GL08" \
7575
-i "pandas.errors.IncompatibleFrequency SA01,SS06,EX01" \
76+
-i "pandas.api.extensions.ExtensionArray.value_counts EX01,RT03,SA01" \
7677
-i "pandas.core.groupby.DataFrameGroupBy.plot PR02" \
7778
-i "pandas.core.groupby.SeriesGroupBy.plot PR02" \
7879
-i "pandas.core.resample.Resampler.quantile PR01,PR07" \

doc/source/getting_started/intro_tutorials/01_table_oriented.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ the values in each list as columns of the ``DataFrame``.
6565
A :class:`DataFrame` is a 2-dimensional data structure that can store data of
6666
different types (including characters, integers, floating point values,
6767
categorical data and more) in columns. It is similar to a spreadsheet, a
68-
SQL table or the ``data.frame`` in R.
68+
SQL table or the ``data.frame`` in `R <https://www.r-project.org/>`__.
6969

7070
- The table has 3 columns, each of them with a column label. The column
7171
labels are respectively ``Name``, ``Age`` and ``Sex``.

doc/source/whatsnew/v2.3.3.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ become the default string dtype in pandas 3.0. See
2222

2323
Bug fixes
2424
^^^^^^^^^
25+
- Fix bug in :meth:`Series.str.replace` using named capture groups (e.g., ``\g<name>``) with the Arrow-backed dtype would raise an error (:issue:`57636`)
2526
- Fix regression in ``~Series.str.contains``, ``~Series.str.match`` and ``~Series.str.fullmatch``
2627
with a compiled regex and custom flags (:issue:`62240`)
2728

doc/source/whatsnew/v3.0.0.rst

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,7 @@ Other API changes
603603
an empty ``RangeIndex`` or empty ``Index`` with object dtype when determining
604604
the dtype of the resulting Index (:issue:`60797`)
605605
- :class:`IncompatibleFrequency` now subclasses ``TypeError`` instead of ``ValueError``. As a result, joins with mismatched frequencies now cast to object like other non-comparable joins, and arithmetic with indexes with mismatched frequencies align (:issue:`55782`)
606+
- :meth:`CategoricalIndex.append` no longer attempts to cast different-dtype indexes to the caller's dtype (:issue:`41626`)
606607
- :meth:`ExtensionDtype.construct_array_type` is now a regular method instead of a ``classmethod`` (:issue:`58663`)
607608
- Comparison operations between :class:`Index` and :class:`Series` now consistently return :class:`Series` regardless of which object is on the left or right (:issue:`36759`)
608609
- Numpy functions like ``np.isinf`` that return a bool dtype when called on a :class:`Index` object now return a bool-dtype :class:`Index` instead of ``np.ndarray`` (:issue:`52676`)
@@ -649,6 +650,7 @@ Other Deprecations
649650
- Deprecated ``pd.core.internals.api.maybe_infer_ndim`` (:issue:`40226`)
650651
- Deprecated allowing constructing or casting to :class:`Categorical` with non-NA values that are not present in specified ``dtype.categories`` (:issue:`40996`)
651652
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.all`, :meth:`DataFrame.min`, :meth:`DataFrame.max`, :meth:`DataFrame.sum`, :meth:`DataFrame.prod`, :meth:`DataFrame.mean`, :meth:`DataFrame.median`, :meth:`DataFrame.sem`, :meth:`DataFrame.var`, :meth:`DataFrame.std`, :meth:`DataFrame.skew`, :meth:`DataFrame.kurt`, :meth:`Series.all`, :meth:`Series.min`, :meth:`Series.max`, :meth:`Series.sum`, :meth:`Series.prod`, :meth:`Series.mean`, :meth:`Series.median`, :meth:`Series.sem`, :meth:`Series.var`, :meth:`Series.std`, :meth:`Series.skew`, and :meth:`Series.kurt`. (:issue:`57087`)
653+
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.groupby` and :meth:`Series.groupby` except ``by`` and ``level``. (:issue:`62102`)
652654
- Deprecated allowing non-keyword arguments in :meth:`Series.to_markdown` except ``buf``. (:issue:`57280`)
653655
- Deprecated allowing non-keyword arguments in :meth:`Series.to_string` except ``buf``. (:issue:`57280`)
654656
- Deprecated behavior of :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupBy.groups`, in a future version ``groups`` by one element list will return tuple instead of scalar. (:issue:`58858`)
@@ -902,6 +904,7 @@ Datetimelike
902904
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
903905
- Bug in :meth:`Index.union` with a ``pyarrow`` timestamp dtype incorrectly returning ``object`` dtype (:issue:`58421`)
904906
- Bug in :meth:`Series.dt.microsecond` producing incorrect results for pyarrow backed :class:`Series`. (:issue:`59154`)
907+
- Bug in :meth:`Timestamp.normalize` and :meth:`DatetimeArray.normalize` returning incorrect results instead of raising on integer overflow for very small (distant past) values (:issue:`60583`)
905908
- Bug in :meth:`Timestamp.replace` failing to update ``unit`` attribute when replacement introduces non-zero ``nanosecond`` or ``microsecond`` (:issue:`57749`)
906909
- Bug in :meth:`to_datetime` not respecting dayfirst if an uncommon date string was passed. (:issue:`58859`)
907910
- Bug in :meth:`to_datetime` on float array with missing values throwing ``FloatingPointError`` (:issue:`58419`)
@@ -972,8 +975,8 @@ Indexing
972975
- Bug in reindexing of :class:`DataFrame` with :class:`PeriodDtype` columns in case of consolidated block (:issue:`60980`, :issue:`60273`)
973976
- Bug in :meth:`DataFrame.loc.__getitem__` and :meth:`DataFrame.iloc.__getitem__` with a :class:`CategoricalDtype` column with integer categories raising when trying to index a row containing a ``NaN`` entry (:issue:`58954`)
974977
- Bug in :meth:`Index.__getitem__` incorrectly raising with a 0-dim ``np.ndarray`` key (:issue:`55601`)
978+
- Bug in adding new rows with :meth:`DataFrame.loc.__setitem__` or :class:`Series.loc.__setitem__` which failed to retain dtype on the object's index in some cases (:issue:`41626`)
975979
- Bug in indexing on a :class:`DatetimeIndex` with a ``timestamp[pyarrow]`` dtype or on a :class:`TimedeltaIndex` with a ``duration[pyarrow]`` dtype (:issue:`62277`)
976-
-
977980

978981
Missing
979982
^^^^^^^
@@ -1081,6 +1084,7 @@ Reshaping
10811084
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
10821085
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
10831086
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
1087+
- Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
10841088
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)
10851089
- Bug in :meth:`DataFrame.pivot_table` incorrectly subaggregating results when called without an ``index`` argument (:issue:`58722`)
10861090
- Bug in :meth:`DataFrame.pivot_table` incorrectly ignoring the ``values`` argument when also supplied to the ``index`` or ``columns`` parameters (:issue:`57876`, :issue:`61292`)
@@ -1091,7 +1095,7 @@ Reshaping
10911095
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)
10921096
- Bug in :meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a :class:`MergeError` in such cases. (:issue:`61402`)
10931097
- Bug in :meth:`DataFrame.merge` with :class:`CategoricalDtype` columns incorrectly raising ``RecursionError`` (:issue:`56376`)
1094-
-
1098+
- Bug in :meth:`DataFrame.merge` with a ``float32`` index incorrectly casting the index to ``float64`` (:issue:`41626`)
10951099

10961100
Sparse
10971101
^^^^^^
@@ -1156,6 +1160,7 @@ Other
11561160
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
11571161
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
11581162
- Deprecated the keyword ``check_datetimelike_compat`` in :meth:`testing.assert_frame_equal` and :meth:`testing.assert_series_equal` (:issue:`55638`)
1163+
- Fixed bug in the :meth:`Series.rank` with object dtype and extremely small float values (:issue:`62036`)
11591164
- Fixed bug where the :class:`DataFrame` constructor misclassified array-like objects with a ``.name`` attribute as :class:`Series` or :class:`Index` (:issue:`61443`)
11601165
- Fixed regression in :meth:`DataFrame.from_records` not initializing subclasses properly (:issue:`57008`)
11611166
-

environment.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,9 @@ dependencies:
9191
- sphinx
9292
- sphinx-design
9393
- sphinx-copybutton
94+
95+
# static typing
96+
- scipy-stubs
9497
- types-python-dateutil
9598
- types-PyMySQL
9699
- types-pytz

pandas/_libs/algos.pyx

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
cimport cython
22
from cython cimport Py_ssize_t
33
from libc.math cimport (
4-
fabs,
54
sqrt,
65
)
76
from libc.stdlib cimport (
@@ -72,13 +71,6 @@ tiebreakers = {
7271
}
7372

7473

75-
cdef bint are_diff(object left, object right):
76-
try:
77-
return fabs(left - right) > FP_ERR
78-
except TypeError:
79-
return left != right
80-
81-
8274
class Infinity:
8375
"""
8476
Provide a positive Infinity comparison method for ranking.
@@ -1135,12 +1127,8 @@ cdef void rank_sorted_1d(
11351127
dups += 1
11361128
sum_ranks += i - grp_start + 1
11371129

1138-
if numeric_object_t is object:
1139-
next_val_diff = at_end or are_diff(masked_vals[sort_indexer[i]],
1140-
masked_vals[sort_indexer[i+1]])
1141-
else:
1142-
next_val_diff = at_end or (masked_vals[sort_indexer[i]]
1143-
!= masked_vals[sort_indexer[i+1]])
1130+
next_val_diff = at_end or (masked_vals[sort_indexer[i]]
1131+
!= masked_vals[sort_indexer[i+1]])
11441132

11451133
# We'll need this check later anyway to determine group size, so just
11461134
# compute it here since shortcircuiting won't help

pandas/_libs/sparse.pyi

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
from collections.abc import Sequence
21
from typing import Self
32

43
import numpy as np
54

6-
from pandas._typing import npt
5+
from pandas._typing import (
6+
TakeIndexer,
7+
npt,
8+
)
79

810
class SparseIndex:
911
length: int
@@ -26,7 +28,7 @@ class SparseIndex:
2628
class IntIndex(SparseIndex):
2729
indices: npt.NDArray[np.int32]
2830
def __init__(
29-
self, length: int, indices: Sequence[int], check_integrity: bool = ...
31+
self, length: int, indices: TakeIndexer, check_integrity: bool = ...
3032
) -> None: ...
3133

3234
class BlockIndex(SparseIndex):

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1339,7 +1339,12 @@ cdef class _Timestamp(ABCTimestamp):
13391339
int64_t ppd = periods_per_day(self._creso)
13401340
_Timestamp ts
13411341
1342-
normalized = normalize_i8_stamp(local_val, ppd)
1342+
try:
1343+
normalized = normalize_i8_stamp(local_val, ppd)
1344+
except OverflowError as err:
1345+
raise ValueError(
1346+
"Cannot normalize Timestamp without integer overflow"
1347+
) from err
13431348
ts = type(self)._from_value_and_reso(normalized, reso=self._creso, tz=None)
13441349
return ts.tz_localize(self.tzinfo)
13451350
@@ -3547,7 +3552,7 @@ Timestamp.daysinmonth = Timestamp.days_in_month
35473552
35483553
35493554
@cython.cdivision(False)
3550-
cdef int64_t normalize_i8_stamp(int64_t local_val, int64_t ppd) noexcept nogil:
3555+
cdef int64_t normalize_i8_stamp(int64_t local_val, int64_t ppd):
35513556
"""
35523557
Round the localized nanosecond timestamp down to the previous midnight.
35533558

@@ -3561,4 +3566,6 @@ cdef int64_t normalize_i8_stamp(int64_t local_val, int64_t ppd) noexcept nogil:
35613566
-------
35623567
int64_t
35633568
"""
3564-
return local_val - (local_val % ppd)
3569+
with cython.overflowcheck(True):
3570+
# GH#60583
3571+
return local_val - (local_val % ppd)

0 commit comments

Comments
 (0)