Skip to content

Commit c174833

Browse files
committed
Merge remote-tracking branch 'upstream/main' into aijams-take-function-invalid-dtype
2 parents f0886de + f918172 commit c174833

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1187
-306
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.13.3
22+
rev: v0.14.3
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -46,7 +46,7 @@ repos:
4646
- id: codespell
4747
types_or: [python, rst, markdown, cython, c]
4848
- repo: https://github.com/MarcoGorelli/cython-lint
49-
rev: v0.17.0
49+
rev: v0.18.1
5050
hooks:
5151
- id: cython-lint
5252
- id: double-quote-cython-strings
@@ -67,11 +67,11 @@ repos:
6767
- id: trailing-whitespace
6868
args: [--markdown-linebreak-ext=md]
6969
- repo: https://github.com/PyCQA/isort
70-
rev: 6.1.0
70+
rev: 7.0.0
7171
hooks:
7272
- id: isort
7373
- repo: https://github.com/asottile/pyupgrade
74-
rev: v3.20.0
74+
rev: v3.21.0
7575
hooks:
7676
- id: pyupgrade
7777
args: [--py311-plus]
@@ -87,7 +87,7 @@ repos:
8787
types: [text] # overwrite types: [rst]
8888
types_or: [python, rst]
8989
- repo: https://github.com/sphinx-contrib/sphinx-lint
90-
rev: v1.0.0
90+
rev: v1.0.1
9191
hooks:
9292
- id: sphinx-lint
9393
args: ["--enable", "all", "--disable", "line-too-long"]

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ If you are simply looking to start working with the pandas codebase, navigate to
179179

180180
You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas).
181181

182-
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!
182+
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’... you can do something about it!
183183

184184
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Slack](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack).
185185

doc/source/user_guide/groupby.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
137137

138138
``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
139139

140-
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
140+
DataFrame groupby always operates along axis 0 (rows). To split by columns, first do
141141
a transpose:
142142

143143
.. ipython::

doc/source/whatsnew/v3.0.0.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,8 @@ All warnings for upcoming changes in pandas will have the base class :class:`pan
156156

157157
Other enhancements
158158
^^^^^^^^^^^^^^^^^^
159+
- :class:`pandas.NamedAgg` now supports passing ``*args`` and ``**kwargs``
160+
to calls of ``aggfunc`` (:issue:`58283`)
159161
- :func:`pandas.merge` propagates the ``attrs`` attribute to the result if all
160162
inputs have identical ``attrs``, as has so far already been the case for
161163
:func:`pandas.concat`.
@@ -737,7 +739,9 @@ Other Deprecations
737739
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
738740
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
739741
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
742+
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
740743
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
744+
- Deprecated support for the Dataframe Interchange Protocol (:issue:`56732`)
741745
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
742746

743747
.. ---------------------------------------------------------------------------
@@ -960,6 +964,7 @@ Categorical
960964
^^^^^^^^^^^
961965
- Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
962966
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
967+
- Bug in :func:`bdate_range` raising ``ValueError`` with frequency ``freq="cbh"`` (:issue:`62849`)
963968
- Bug in :func:`testing.assert_index_equal` raising ``TypeError`` instead of ``AssertionError`` for incomparable ``CategoricalIndex`` when ``check_categorical=True`` and ``exact=False`` (:issue:`61935`)
964969
- Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
965970
- Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
@@ -974,12 +979,15 @@ Datetimelike
974979
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
975980
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
976981
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
982+
- Bug in :func:`infer_freq` with a :class:`Series` with :class:`ArrowDtype` timestamp dtype incorrectly raising ``TypeError`` (:issue:`58403`)
977983
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
978984
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
979985
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
980986
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
981987
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
982988
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
989+
- Bug in :meth:`DateOffset.rollback` (and subclass methods) with ``normalize=True`` rolling back one offset too long (:issue:`32616`)
990+
- Bug in :meth:`DatetimeIndex.asof` with a string key giving incorrect results (:issue:`50946`)
983991
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
984992
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
985993
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
@@ -998,6 +1006,7 @@ Datetimelike
9981006
- Bug in comparison between objects with pyarrow date dtype and ``timestamp[pyarrow]`` or ``np.datetime64`` dtype failing to consider these as non-comparable (:issue:`62157`)
9991007
- Bug in constructing arrays with :class:`ArrowDtype` with ``timestamp`` type incorrectly allowing ``Decimal("NaN")`` (:issue:`61773`)
10001008
- Bug in constructing arrays with a timezone-aware :class:`ArrowDtype` from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like :class:`DatetimeTZDtype` (:issue:`61775`)
1009+
- Bug in retaining frequency in :meth:`value_counts` specifically for :meth:`DatetimeIndex` and :meth:`TimedeltaIndex` (:issue:`33830`)
10011010
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
10021011

10031012
Timedelta
@@ -1049,6 +1058,7 @@ Interval
10491058
- Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
10501059
- Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
10511060
- Bug in :meth:`IntervalIndex.get_indexer` and :meth:`IntervalIndex.drop` when one of the sides of the index is non-unique (:issue:`52245`)
1061+
- Construction of :class:`IntervalArray` and :class:`IntervalIndex` from arrays with mismatched signed/unsigned integer dtypes (e.g., ``int64`` and ``uint64``) now raises a :exc:`TypeError` instead of proceeding silently. (:issue:`55715`)
10521062

10531063
Indexing
10541064
^^^^^^^^
@@ -1114,6 +1124,7 @@ I/O
11141124
- Bug in :meth:`read_csv` for the ``c`` and ``python`` engines where parsing numbers with large exponents caused overflows. Now, numbers with large positive exponents are parsed as ``inf`` or ``-inf`` depending on the sign of the mantissa, while those with large negative exponents are parsed as ``0.0`` (:issue:`62617`, :issue:`38794`, :issue:`62740`)
11151125
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
11161126
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
1127+
- Bug in :meth:`read_csv` where it did not appropriately skip a line when instructed, causing Empty Data Error (:issue:`62739`)
11171128
- Bug in :meth:`read_csv` where the order of the ``na_values`` makes an inconsistency when ``na_values`` is a list non-string values. (:issue:`59303`)
11181129
- Bug in :meth:`read_csv` with ``c`` and ``python`` engines reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
11191130
- Bug in :meth:`read_csv` with ``engine="c"`` reading large float numbers with preceding integers as strings. Now reads them as floats. (:issue:`51295`)
@@ -1174,16 +1185,20 @@ Groupby/resample/rolling
11741185
- Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
11751186
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
11761187
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
1188+
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
11771189

11781190
Reshaping
11791191
^^^^^^^^^
11801192
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11811193
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11821194
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
1195+
- Bug in :meth:`DataFrame.combine_first` with non-unique columns incorrectly raising (:issue:`29135`)
1196+
- Bug in :meth:`DataFrame.combine` with non-unique columns incorrectly raising (:issue:`51340`)
11831197
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
11841198
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
11851199
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
11861200
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
1201+
- Bug in :meth:`Series.combine_first` incorrectly replacing ``None`` entries with ``NaN`` (:issue:`58977`)
11871202
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
11881203
- Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
11891204
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)
@@ -1264,6 +1279,7 @@ Other
12641279
- Bug in ``divmod`` and ``rdivmod`` with :class:`DataFrame`, :class:`Series`, and :class:`Index` with ``bool`` dtypes failing to raise, which was inconsistent with ``__floordiv__`` behavior (:issue:`46043`)
12651280
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
12661281
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
1282+
- Bug when calling :py:func:`copy.copy` on a :class:`DataFrame` or :class:`Series` which would return a deep copy instead of a shallow copy (:issue:`62971`)
12671283
- Deprecated the keyword ``check_datetimelike_compat`` in :meth:`testing.assert_frame_equal` and :meth:`testing.assert_series_equal` (:issue:`55638`)
12681284
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when trying to replace :class:`NA` values in a :class:`Float64Dtype` object with ``np.nan``; this now works with ``pd.set_option("mode.nan_is_na", False)`` and is irrelevant otherwise (:issue:`55127`)
12691285
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when trying to replace :class:`np.nan` values in a :class:`Int64Dtype` object with :class:`NA`; this is now a no-op with ``pd.set_option("mode.nan_is_na", False)`` and is irrelevant otherwise (:issue:`51237`)

pandas/_libs/tslibs/offsets.pyx

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -692,6 +692,9 @@ cdef class BaseOffset:
692692
Rolled timestamp if not on offset, otherwise unchanged timestamp.
693693
"""
694694
dt = Timestamp(dt)
695+
if self.normalize and (dt - dt.normalize())._value != 0:
696+
# GH#32616
697+
dt = dt.normalize()
695698
if not self.is_on_offset(dt):
696699
dt = dt - type(self)(1, normalize=self.normalize, **self.kwds)
697700
return dt
@@ -5685,18 +5688,27 @@ def shift_month(stamp: datetime, months: int, day_opt: object = None) -> datetim
56855688
cdef:
56865689
int year, month, day
56875690
int days_in_month, dy
5691+
npy_datetimestruct dts
5692+
5693+
if isinstance(stamp, _Timestamp):
5694+
creso = (<_Timestamp>stamp)._creso
5695+
val = (<_Timestamp>stamp)._value
5696+
pandas_datetime_to_datetimestruct(val, creso, &dts)
5697+
else:
5698+
# Plain datetime/date
5699+
pydate_to_dtstruct(stamp, &dts)
56885700

5689-
dy = (stamp.month + months) // 12
5690-
month = (stamp.month + months) % 12
5701+
dy = (dts.month + months) // 12
5702+
month = (dts.month + months) % 12
56915703

56925704
if month == 0:
56935705
month = 12
56945706
dy -= 1
5695-
year = stamp.year + dy
5707+
year = dts.year + dy
56965708

56975709
if day_opt is None:
56985710
days_in_month = get_days_in_month(year, month)
5699-
day = min(stamp.day, days_in_month)
5711+
day = min(dts.day, days_in_month)
57005712
elif day_opt == "start":
57015713
day = 1
57025714
elif day_opt == "end":

pandas/conftest.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,12 +135,14 @@ def pytest_collection_modifyitems(items, config) -> None:
135135
# Warnings from doctests that can be ignored; place reason in comment above.
136136
# Each entry specifies (path, message) - see the ignore_doctest_warning function
137137
ignored_doctest_warnings = [
138+
("api.interchange.from_dataframe", ".*Interchange Protocol is deprecated"),
138139
("is_int64_dtype", "is_int64_dtype is deprecated"),
139140
("is_interval_dtype", "is_interval_dtype is deprecated"),
140141
("is_period_dtype", "is_period_dtype is deprecated"),
141142
("is_datetime64tz_dtype", "is_datetime64tz_dtype is deprecated"),
142143
("is_categorical_dtype", "is_categorical_dtype is deprecated"),
143144
("is_sparse", "is_sparse is deprecated"),
145+
("DataFrame.__dataframe__", "Interchange Protocol is deprecated"),
144146
("DataFrameGroupBy.fillna", "DataFrameGroupBy.fillna is deprecated"),
145147
("DataFrameGroupBy.corrwith", "DataFrameGroupBy.corrwith is deprecated"),
146148
("NDFrame.replace", "Series.replace without 'value'"),

pandas/core/algorithms.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -868,8 +868,10 @@ def value_counts_internal(
868868
dropna: bool = True,
869869
) -> Series:
870870
from pandas import (
871+
DatetimeIndex,
871872
Index,
872873
Series,
874+
TimedeltaIndex,
873875
)
874876

875877
index_name = getattr(values, "name", None)
@@ -934,6 +936,17 @@ def value_counts_internal(
934936
# Starting in 3.0, we no longer perform dtype inference on the
935937
# Index object we construct here, xref GH#56161
936938
idx = Index(keys, dtype=keys.dtype, name=index_name)
939+
940+
if (
941+
bins is None
942+
and not sort
943+
and isinstance(values, (DatetimeIndex, TimedeltaIndex))
944+
and idx.equals(values)
945+
and values.inferred_freq is not None
946+
):
947+
# Preserve freq of original index
948+
idx.freq = values.inferred_freq # type: ignore[attr-defined]
949+
937950
result = Series(counts, index=idx, name=name, copy=False)
938951

939952
if sort:

pandas/core/apply.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1745,7 +1745,13 @@ def reconstruct_func(
17451745
>>> reconstruct_func("min")
17461746
(False, 'min', None, None)
17471747
"""
1748-
relabeling = func is None and is_multi_agg_with_relabel(**kwargs)
1748+
from pandas.core.groupby.generic import NamedAgg
1749+
1750+
relabeling = func is None and (
1751+
is_multi_agg_with_relabel(**kwargs)
1752+
or any(isinstance(v, NamedAgg) for v in kwargs.values())
1753+
)
1754+
17491755
columns: tuple[str, ...] | None = None
17501756
order: npt.NDArray[np.intp] | None = None
17511757

@@ -1766,9 +1772,22 @@ def reconstruct_func(
17661772
# "Callable[..., Any] | str | list[Callable[..., Any] | str] |
17671773
# MutableMapping[Hashable, Callable[..., Any] | str | list[Callable[..., Any] |
17681774
# str]] | None")
1775+
converted_kwargs = {}
1776+
for key, val in kwargs.items():
1777+
if isinstance(val, NamedAgg):
1778+
aggfunc = val.aggfunc
1779+
if val.args or val.kwargs:
1780+
aggfunc = lambda x, func=aggfunc, a=val.args, kw=val.kwargs: func(
1781+
x, *a, **kw
1782+
)
1783+
converted_kwargs[key] = (val.column, aggfunc)
1784+
else:
1785+
converted_kwargs[key] = val
1786+
17691787
func, columns, order = normalize_keyword_aggregation( # type: ignore[assignment]
1770-
kwargs
1788+
converted_kwargs
17711789
)
1790+
17721791
assert func is not None
17731792

17741793
return relabeling, func, columns, order

pandas/core/arrays/arrow/array.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -829,6 +829,14 @@ def __arrow_array__(self, type=None):
829829
"""Convert myself to a pyarrow ChunkedArray."""
830830
return self._pa_array
831831

832+
def __array_ufunc__(self, ufunc: np.ufunc, method: str, *inputs, **kwargs):
833+
# Need to wrap np.array results GH#62800
834+
result = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
835+
if type(self) is ArrowExtensionArray:
836+
# Exclude ArrowStringArray
837+
return type(self)._from_sequence(result)
838+
return result
839+
832840
def __array__(
833841
self, dtype: NpDtype | None = None, copy: bool | None = None
834842
) -> np.ndarray:

0 commit comments

Comments
 (0)