Skip to content

Commit fcbdee4

Browse files
authored
Merge branch 'main' into aggregate
2 parents 93d51c1 + ead37b2 commit fcbdee4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+851
-330
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.13.3
22+
rev: v0.14.3
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -46,7 +46,7 @@ repos:
4646
- id: codespell
4747
types_or: [python, rst, markdown, cython, c]
4848
- repo: https://github.com/MarcoGorelli/cython-lint
49-
rev: v0.17.0
49+
rev: v0.18.1
5050
hooks:
5151
- id: cython-lint
5252
- id: double-quote-cython-strings
@@ -67,11 +67,11 @@ repos:
6767
- id: trailing-whitespace
6868
args: [--markdown-linebreak-ext=md]
6969
- repo: https://github.com/PyCQA/isort
70-
rev: 6.1.0
70+
rev: 7.0.0
7171
hooks:
7272
- id: isort
7373
- repo: https://github.com/asottile/pyupgrade
74-
rev: v3.20.0
74+
rev: v3.21.0
7575
hooks:
7676
- id: pyupgrade
7777
args: [--py311-plus]
@@ -87,7 +87,7 @@ repos:
8787
types: [text] # overwrite types: [rst]
8888
types_or: [python, rst]
8989
- repo: https://github.com/sphinx-contrib/sphinx-lint
90-
rev: v1.0.0
90+
rev: v1.0.1
9191
hooks:
9292
- id: sphinx-lint
9393
args: ["--enable", "all", "--disable", "line-too-long"]

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ If you are simply looking to start working with the pandas codebase, navigate to
179179

180180
You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas).
181181

182-
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!
182+
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’... you can do something about it!
183183

184184
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Slack](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack).
185185

doc/make.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ def _process_single_doc(self, single_doc):
105105
@staticmethod
106106
def _run_os(*args) -> None:
107107
"""
108-
Execute a command as a OS terminal.
108+
Execute a command as an OS terminal.
109109
110110
Parameters
111111
----------

doc/source/user_guide/groupby.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
137137

138138
``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
139139

140-
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
140+
DataFrame groupby always operates along axis 0 (rows). To split by columns, first do
141141
a transpose:
142142

143143
.. ipython::

doc/source/whatsnew/v3.0.0.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -739,7 +739,9 @@ Other Deprecations
739739
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
740740
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
741741
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
742+
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
742743
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
744+
- Deprecated support for the Dataframe Interchange Protocol (:issue:`56732`)
743745
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
744746

745747
.. ---------------------------------------------------------------------------
@@ -962,6 +964,7 @@ Categorical
962964
^^^^^^^^^^^
963965
- Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
964966
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
967+
- Bug in :func:`bdate_range` raising ``ValueError`` with frequency ``freq="cbh"`` (:issue:`62849`)
965968
- Bug in :func:`testing.assert_index_equal` raising ``TypeError`` instead of ``AssertionError`` for incomparable ``CategoricalIndex`` when ``check_categorical=True`` and ``exact=False`` (:issue:`61935`)
966969
- Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
967970
- Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
@@ -976,12 +979,15 @@ Datetimelike
976979
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
977980
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
978981
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
982+
- Bug in :func:`infer_freq` with a :class:`Series` with :class:`ArrowDtype` timestamp dtype incorrectly raising ``TypeError`` (:issue:`58403`)
979983
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
980984
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
981985
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
982986
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
983987
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
984988
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
989+
- Bug in :meth:`DateOffset.rollback` (and subclass methods) with ``normalize=True`` rolling back one offset too long (:issue:`32616`)
990+
- Bug in :meth:`DatetimeIndex.asof` with a string key giving incorrect results (:issue:`50946`)
985991
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
986992
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
987993
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
@@ -1000,6 +1006,7 @@ Datetimelike
10001006
- Bug in comparison between objects with pyarrow date dtype and ``timestamp[pyarrow]`` or ``np.datetime64`` dtype failing to consider these as non-comparable (:issue:`62157`)
10011007
- Bug in constructing arrays with :class:`ArrowDtype` with ``timestamp`` type incorrectly allowing ``Decimal("NaN")`` (:issue:`61773`)
10021008
- Bug in constructing arrays with a timezone-aware :class:`ArrowDtype` from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like :class:`DatetimeTZDtype` (:issue:`61775`)
1009+
- Bug in retaining frequency in :meth:`value_counts` specifically for :meth:`DatetimeIndex` and :meth:`TimedeltaIndex` (:issue:`33830`)
10031010
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
10041011

10051012
Timedelta
@@ -1051,6 +1058,7 @@ Interval
10511058
- Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
10521059
- Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
10531060
- Bug in :meth:`IntervalIndex.get_indexer` and :meth:`IntervalIndex.drop` when one of the sides of the index is non-unique (:issue:`52245`)
1061+
- Construction of :class:`IntervalArray` and :class:`IntervalIndex` from arrays with mismatched signed/unsigned integer dtypes (e.g., ``int64`` and ``uint64``) now raises a :exc:`TypeError` instead of proceeding silently. (:issue:`55715`)
10541062

10551063
Indexing
10561064
^^^^^^^^
@@ -1176,16 +1184,20 @@ Groupby/resample/rolling
11761184
- Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
11771185
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
11781186
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
1187+
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
11791188

11801189
Reshaping
11811190
^^^^^^^^^
11821191
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11831192
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11841193
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
1194+
- Bug in :meth:`DataFrame.combine_first` with non-unique columns incorrectly raising (:issue:`29135`)
1195+
- Bug in :meth:`DataFrame.combine` with non-unique columns incorrectly raising (:issue:`51340`)
11851196
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
11861197
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
11871198
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
11881199
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
1200+
- Bug in :meth:`Series.combine_first` incorrectly replacing ``None`` entries with ``NaN`` (:issue:`58977`)
11891201
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
11901202
- Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
11911203
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)
@@ -1238,6 +1250,7 @@ Other
12381250
- Bug in :meth:`DataFrame.query` where using duplicate column names led to a ``TypeError``. (:issue:`59950`)
12391251
- Bug in :meth:`DataFrame.query` which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character ``#``, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (:issue:`59285`) (:issue:`49633`)
12401252
- Bug in :meth:`DataFrame.query` which raised an exception when querying integer column names using backticks. (:issue:`60494`)
1253+
- Bug in :meth:`DataFrame.rename` and :meth:`Series.rename` when passed a ``mapper``, ``index``, or ``columns`` argument that is a :class:`Series` with non-unique ``ser.index`` producing a corrupted result instead of raising ``ValueError`` (:issue:`58621`)
12411254
- Bug in :meth:`DataFrame.sample` with ``replace=False`` and ``(n * max(weights) / sum(weights)) > 1``, the method would return biased results. Now raises ``ValueError``. (:issue:`61516`)
12421255
- Bug in :meth:`DataFrame.shift` where passing a ``freq`` on a DataFrame with no columns did not shift the index correctly. (:issue:`60102`)
12431256
- Bug in :meth:`DataFrame.sort_index` when passing ``axis="columns"`` and ``ignore_index=True`` and ``ascending=False`` not returning a :class:`RangeIndex` columns (:issue:`57293`)

pandas/_config/localization.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def can_set_locale(lc: str, lc_var: int = locale.LC_ALL) -> bool:
7979
with set_locale(lc, lc_var=lc_var):
8080
pass
8181
except (ValueError, locale.Error):
82-
# horrible name for a Exception subclass
82+
# horrible name for an Exception subclass
8383
return False
8484
else:
8585
return True

pandas/_libs/tslibs/offsets.pyx

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -692,6 +692,9 @@ cdef class BaseOffset:
692692
Rolled timestamp if not on offset, otherwise unchanged timestamp.
693693
"""
694694
dt = Timestamp(dt)
695+
if self.normalize and (dt - dt.normalize())._value != 0:
696+
# GH#32616
697+
dt = dt.normalize()
695698
if not self.is_on_offset(dt):
696699
dt = dt - type(self)(1, normalize=self.normalize, **self.kwds)
697700
return dt
@@ -5685,18 +5688,27 @@ def shift_month(stamp: datetime, months: int, day_opt: object = None) -> datetim
56855688
cdef:
56865689
int year, month, day
56875690
int days_in_month, dy
5691+
npy_datetimestruct dts
5692+
5693+
if isinstance(stamp, _Timestamp):
5694+
creso = (<_Timestamp>stamp)._creso
5695+
val = (<_Timestamp>stamp)._value
5696+
pandas_datetime_to_datetimestruct(val, creso, &dts)
5697+
else:
5698+
# Plain datetime/date
5699+
pydate_to_dtstruct(stamp, &dts)
56885700

5689-
dy = (stamp.month + months) // 12
5690-
month = (stamp.month + months) % 12
5701+
dy = (dts.month + months) // 12
5702+
month = (dts.month + months) % 12
56915703

56925704
if month == 0:
56935705
month = 12
56945706
dy -= 1
5695-
year = stamp.year + dy
5707+
year = dts.year + dy
56965708

56975709
if day_opt is None:
56985710
days_in_month = get_days_in_month(year, month)
5699-
day = min(stamp.day, days_in_month)
5711+
day = min(dts.day, days_in_month)
57005712
elif day_opt == "start":
57015713
day = 1
57025714
elif day_opt == "end":

pandas/conftest.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,12 +135,14 @@ def pytest_collection_modifyitems(items, config) -> None:
135135
# Warnings from doctests that can be ignored; place reason in comment above.
136136
# Each entry specifies (path, message) - see the ignore_doctest_warning function
137137
ignored_doctest_warnings = [
138+
("api.interchange.from_dataframe", ".*Interchange Protocol is deprecated"),
138139
("is_int64_dtype", "is_int64_dtype is deprecated"),
139140
("is_interval_dtype", "is_interval_dtype is deprecated"),
140141
("is_period_dtype", "is_period_dtype is deprecated"),
141142
("is_datetime64tz_dtype", "is_datetime64tz_dtype is deprecated"),
142143
("is_categorical_dtype", "is_categorical_dtype is deprecated"),
143144
("is_sparse", "is_sparse is deprecated"),
145+
("DataFrame.__dataframe__", "Interchange Protocol is deprecated"),
144146
("DataFrameGroupBy.fillna", "DataFrameGroupBy.fillna is deprecated"),
145147
("DataFrameGroupBy.corrwith", "DataFrameGroupBy.corrwith is deprecated"),
146148
("NDFrame.replace", "Series.replace without 'value'"),

pandas/core/algorithms.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -868,8 +868,10 @@ def value_counts_internal(
868868
dropna: bool = True,
869869
) -> Series:
870870
from pandas import (
871+
DatetimeIndex,
871872
Index,
872873
Series,
874+
TimedeltaIndex,
873875
)
874876

875877
index_name = getattr(values, "name", None)
@@ -934,6 +936,17 @@ def value_counts_internal(
934936
# Starting in 3.0, we no longer perform dtype inference on the
935937
# Index object we construct here, xref GH#56161
936938
idx = Index(keys, dtype=keys.dtype, name=index_name)
939+
940+
if (
941+
bins is None
942+
and not sort
943+
and isinstance(values, (DatetimeIndex, TimedeltaIndex))
944+
and idx.equals(values)
945+
and values.inferred_freq is not None
946+
):
947+
# Preserve freq of original index
948+
idx.freq = values.inferred_freq # type: ignore[attr-defined]
949+
937950
result = Series(counts, index=idx, name=name, copy=False)
938951

939952
if sort:

pandas/core/arrays/arrow/array.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -829,6 +829,14 @@ def __arrow_array__(self, type=None):
829829
"""Convert myself to a pyarrow ChunkedArray."""
830830
return self._pa_array
831831

832+
def __array_ufunc__(self, ufunc: np.ufunc, method: str, *inputs, **kwargs):
833+
# Need to wrap np.array results GH#62800
834+
result = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
835+
if type(self) is ArrowExtensionArray:
836+
# Exclude ArrowStringArray
837+
return type(self)._from_sequence(result)
838+
return result
839+
832840
def __array__(
833841
self, dtype: NpDtype | None = None, copy: bool | None = None
834842
) -> np.ndarray:

0 commit comments

Comments
 (0)