Skip to content

Commit c128f9c

Browse files
authored
Merge branch 'main' into gh-62717
2 parents 1d97b3a + ead37b2 commit c128f9c

File tree

31 files changed

+675
-257
lines changed

31 files changed

+675
-257
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.13.3
22+
rev: v0.14.3
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -46,7 +46,7 @@ repos:
4646
- id: codespell
4747
types_or: [python, rst, markdown, cython, c]
4848
- repo: https://github.com/MarcoGorelli/cython-lint
49-
rev: v0.17.0
49+
rev: v0.18.1
5050
hooks:
5151
- id: cython-lint
5252
- id: double-quote-cython-strings
@@ -67,11 +67,11 @@ repos:
6767
- id: trailing-whitespace
6868
args: [--markdown-linebreak-ext=md]
6969
- repo: https://github.com/PyCQA/isort
70-
rev: 6.1.0
70+
rev: 7.0.0
7171
hooks:
7272
- id: isort
7373
- repo: https://github.com/asottile/pyupgrade
74-
rev: v3.20.0
74+
rev: v3.21.0
7575
hooks:
7676
- id: pyupgrade
7777
args: [--py311-plus]
@@ -87,7 +87,7 @@ repos:
8787
types: [text] # overwrite types: [rst]
8888
types_or: [python, rst]
8989
- repo: https://github.com/sphinx-contrib/sphinx-lint
90-
rev: v1.0.0
90+
rev: v1.0.1
9191
hooks:
9292
- id: sphinx-lint
9393
args: ["--enable", "all", "--disable", "line-too-long"]

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ If you are simply looking to start working with the pandas codebase, navigate to
179179

180180
You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas).
181181

182-
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!
182+
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’... you can do something about it!
183183

184184
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Slack](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack).
185185

doc/source/user_guide/groupby.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
137137

138138
``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
139139

140-
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
140+
DataFrame groupby always operates along axis 0 (rows). To split by columns, first do
141141
a transpose:
142142

143143
.. ipython::

doc/source/whatsnew/v3.0.0.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -737,7 +737,9 @@ Other Deprecations
737737
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
738738
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
739739
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
740+
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
740741
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
742+
- Deprecated support for the Dataframe Interchange Protocol (:issue:`56732`)
741743
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
742744

743745
.. ---------------------------------------------------------------------------
@@ -960,6 +962,7 @@ Categorical
960962
^^^^^^^^^^^
961963
- Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
962964
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
965+
- Bug in :func:`bdate_range` raising ``ValueError`` with frequency ``freq="cbh"`` (:issue:`62849`)
963966
- Bug in :func:`testing.assert_index_equal` raising ``TypeError`` instead of ``AssertionError`` for incomparable ``CategoricalIndex`` when ``check_categorical=True`` and ``exact=False`` (:issue:`61935`)
964967
- Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
965968
- Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
@@ -974,13 +977,15 @@ Datetimelike
974977
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
975978
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
976979
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
980+
- Bug in :func:`infer_freq` with a :class:`Series` with :class:`ArrowDtype` timestamp dtype incorrectly raising ``TypeError`` (:issue:`58403`)
977981
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
978982
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
979983
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
980984
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
981985
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
982986
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
983987
- Bug in :meth:`DateOffset.rollback` (and subclass methods) with ``normalize=True`` rolling back one offset too long (:issue:`32616`)
988+
- Bug in :meth:`DatetimeIndex.asof` with a string key giving incorrect results (:issue:`50946`)
984989
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
985990
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
986991
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
@@ -999,6 +1004,7 @@ Datetimelike
9991004
- Bug in comparison between objects with pyarrow date dtype and ``timestamp[pyarrow]`` or ``np.datetime64`` dtype failing to consider these as non-comparable (:issue:`62157`)
10001005
- Bug in constructing arrays with :class:`ArrowDtype` with ``timestamp`` type incorrectly allowing ``Decimal("NaN")`` (:issue:`61773`)
10011006
- Bug in constructing arrays with a timezone-aware :class:`ArrowDtype` from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like :class:`DatetimeTZDtype` (:issue:`61775`)
1007+
- Bug in retaining frequency in :meth:`value_counts` specifically for :meth:`DatetimeIndex` and :meth:`TimedeltaIndex` (:issue:`33830`)
10021008
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
10031009

10041010
Timedelta
@@ -1176,16 +1182,20 @@ Groupby/resample/rolling
11761182
- Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
11771183
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
11781184
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
1185+
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
11791186

11801187
Reshaping
11811188
^^^^^^^^^
11821189
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11831190
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11841191
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
1192+
- Bug in :meth:`DataFrame.combine_first` with non-unique columns incorrectly raising (:issue:`29135`)
1193+
- Bug in :meth:`DataFrame.combine` with non-unique columns incorrectly raising (:issue:`51340`)
11851194
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
11861195
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
11871196
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
11881197
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
1198+
- Bug in :meth:`Series.combine_first` incorrectly replacing ``None`` entries with ``NaN`` (:issue:`58977`)
11891199
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
11901200
- Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
11911201
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)

pandas/_libs/tslibs/offsets.pyx

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5688,18 +5688,27 @@ def shift_month(stamp: datetime, months: int, day_opt: object = None) -> datetim
56885688
cdef:
56895689
int year, month, day
56905690
int days_in_month, dy
5691+
npy_datetimestruct dts
5692+
5693+
if isinstance(stamp, _Timestamp):
5694+
creso = (<_Timestamp>stamp)._creso
5695+
val = (<_Timestamp>stamp)._value
5696+
pandas_datetime_to_datetimestruct(val, creso, &dts)
5697+
else:
5698+
# Plain datetime/date
5699+
pydate_to_dtstruct(stamp, &dts)
56915700

5692-
dy = (stamp.month + months) // 12
5693-
month = (stamp.month + months) % 12
5701+
dy = (dts.month + months) // 12
5702+
month = (dts.month + months) % 12
56945703

56955704
if month == 0:
56965705
month = 12
56975706
dy -= 1
5698-
year = stamp.year + dy
5707+
year = dts.year + dy
56995708

57005709
if day_opt is None:
57015710
days_in_month = get_days_in_month(year, month)
5702-
day = min(stamp.day, days_in_month)
5711+
day = min(dts.day, days_in_month)
57035712
elif day_opt == "start":
57045713
day = 1
57055714
elif day_opt == "end":

pandas/conftest.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,12 +135,14 @@ def pytest_collection_modifyitems(items, config) -> None:
135135
# Warnings from doctests that can be ignored; place reason in comment above.
136136
# Each entry specifies (path, message) - see the ignore_doctest_warning function
137137
ignored_doctest_warnings = [
138+
("api.interchange.from_dataframe", ".*Interchange Protocol is deprecated"),
138139
("is_int64_dtype", "is_int64_dtype is deprecated"),
139140
("is_interval_dtype", "is_interval_dtype is deprecated"),
140141
("is_period_dtype", "is_period_dtype is deprecated"),
141142
("is_datetime64tz_dtype", "is_datetime64tz_dtype is deprecated"),
142143
("is_categorical_dtype", "is_categorical_dtype is deprecated"),
143144
("is_sparse", "is_sparse is deprecated"),
145+
("DataFrame.__dataframe__", "Interchange Protocol is deprecated"),
144146
("DataFrameGroupBy.fillna", "DataFrameGroupBy.fillna is deprecated"),
145147
("DataFrameGroupBy.corrwith", "DataFrameGroupBy.corrwith is deprecated"),
146148
("NDFrame.replace", "Series.replace without 'value'"),

pandas/core/algorithms.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -868,8 +868,10 @@ def value_counts_internal(
868868
dropna: bool = True,
869869
) -> Series:
870870
from pandas import (
871+
DatetimeIndex,
871872
Index,
872873
Series,
874+
TimedeltaIndex,
873875
)
874876

875877
index_name = getattr(values, "name", None)
@@ -934,6 +936,17 @@ def value_counts_internal(
934936
# Starting in 3.0, we no longer perform dtype inference on the
935937
# Index object we construct here, xref GH#56161
936938
idx = Index(keys, dtype=keys.dtype, name=index_name)
939+
940+
if (
941+
bins is None
942+
and not sort
943+
and isinstance(values, (DatetimeIndex, TimedeltaIndex))
944+
and idx.equals(values)
945+
and values.inferred_freq is not None
946+
):
947+
# Preserve freq of original index
948+
idx.freq = values.inferred_freq # type: ignore[attr-defined]
949+
937950
result = Series(counts, index=idx, name=name, copy=False)
938951

939952
if sort:

pandas/core/frame.py

Lines changed: 36 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -916,6 +916,14 @@ def __dataframe__(
916916
"""
917917
Return the dataframe interchange object implementing the interchange protocol.
918918
919+
.. deprecated:: 3.0.0
920+
921+
The Dataframe Interchange Protocol is deprecated.
922+
For dataframe-agnostic code, you may want to look into:
923+
924+
- `Arrow PyCapsule Interface <https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html>`_
925+
- `Narwhals <https://github.com/narwhals-dev/narwhals>`_
926+
919927
.. note::
920928
921929
For new development, we highly recommend using the Arrow C Data Interface
@@ -970,7 +978,14 @@ def __dataframe__(
970978
These methods (``column_names``, ``select_columns_by_name``) should work
971979
for any dataframe library which implements the interchange protocol.
972980
"""
973-
981+
warnings.warn(
982+
"The Dataframe Interchange Protocol is deprecated.\n"
983+
"For dataframe-agnostic code, you may want to look into:\n"
984+
"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n"
985+
"- Narwhals: https://github.com/narwhals-dev/narwhals\n",
986+
Pandas4Warning,
987+
stacklevel=find_stack_level(),
988+
)
974989
from pandas.core.interchange.dataframe import PandasDataFrameXchg
975990

976991
return PandasDataFrameXchg(self, allow_copy=allow_copy)
@@ -9038,16 +9053,6 @@ def combine(
90389053
0 0 -5.0
90399054
1 0 4.0
90409055
9041-
However, if the same element in both dataframes is None, that None
9042-
is preserved
9043-
9044-
>>> df1 = pd.DataFrame({"A": [0, 0], "B": [None, 4]})
9045-
>>> df2 = pd.DataFrame({"A": [1, 1], "B": [None, 3]})
9046-
>>> df1.combine(df2, take_smaller, fill_value=-5)
9047-
A B
9048-
0 0 -5.0
9049-
1 0 3.0
9050-
90519056
Example that demonstrates the use of `overwrite` and behavior when
90529057
the axis differ between the dataframes.
90539058
@@ -9106,11 +9111,14 @@ def combine(
91069111

91079112
# preserve column order
91089113
new_columns = self.columns.union(other_columns, sort=False)
9114+
this = this.reindex(new_columns, axis=1)
9115+
other = other.reindex(new_columns, axis=1)
9116+
91099117
do_fill = fill_value is not None
91109118
result = {}
9111-
for col in new_columns:
9112-
series = this[col]
9113-
other_series = other[col]
9119+
for i in range(this.shape[1]):
9120+
series = this.iloc[:, i]
9121+
other_series = other.iloc[:, i]
91149122

91159123
this_dtype = series.dtype
91169124
other_dtype = other_series.dtype
@@ -9121,7 +9129,7 @@ def combine(
91219129
# don't overwrite columns unnecessarily
91229130
# DO propagate if this column is not in the intersection
91239131
if not overwrite and other_mask.all():
9124-
result[col] = this[col].copy()
9132+
result[i] = series.copy()
91259133
continue
91269134

91279135
if do_fill:
@@ -9130,7 +9138,7 @@ def combine(
91309138
series[this_mask] = fill_value
91319139
other_series[other_mask] = fill_value
91329140

9133-
if col not in self.columns:
9141+
if new_columns[i] not in self.columns:
91349142
# If self DataFrame does not have col in other DataFrame,
91359143
# try to promote series, which is all NaN, as other_dtype.
91369144
new_dtype = other_dtype
@@ -9155,10 +9163,10 @@ def combine(
91559163
arr, new_dtype
91569164
)
91579165

9158-
result[col] = arr
9166+
result[i] = arr
91599167

9160-
# convert_objects just in case
9161-
frame_result = self._constructor(result, index=new_index, columns=new_columns)
9168+
frame_result = self._constructor(result, index=new_index)
9169+
frame_result.columns = new_columns
91629170
return frame_result.__finalize__(self, method="combine")
91639171

91649172
def combine_first(self, other: DataFrame) -> DataFrame:
@@ -9222,9 +9230,14 @@ def combiner(x: Series, y: Series):
92229230
combined = self.combine(other, combiner, overwrite=False)
92239231

92249232
dtypes = {
9233+
# Check for isinstance(..., (np.dtype, ExtensionDtype))
9234+
# to prevent raising on non-unique columns see GH#29135.
9235+
# Note we will just not-cast in these cases.
92259236
col: find_common_type([self.dtypes[col], other.dtypes[col]])
92269237
for col in self.columns.intersection(other.columns)
9227-
if combined.dtypes[col] != self.dtypes[col]
9238+
if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))
9239+
and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))
9240+
and combined.dtypes[col] != self.dtypes[col]
92289241
}
92299242

92309243
if dtypes:
@@ -9432,7 +9445,7 @@ def groupby(
94329445
index. If a dict or Series is passed, the Series or dict VALUES
94339446
will be used to determine the groups (the Series' values are first
94349447
aligned; see ``.align()`` method). If a list or ndarray of length
9435-
equal to the selected axis is passed (see the `groupby user guide
9448+
equal to the number of rows is passed (see the `groupby user guide
94369449
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups>`_),
94379450
the values are used as-is to determine the groups. A label or list
94389451
of labels may be passed to group by the columns in ``self``.
@@ -13820,8 +13833,8 @@ def quantile(
1382013833
0.1 1 1
1382113834
0.5 3 100
1382213835
13823-
Specifying `numeric_only=False` will also compute the quantile of
13824-
datetime and timedelta data.
13836+
Specifying `numeric_only=False` will compute the quantiles for all
13837+
columns.
1382513838
1382613839
>>> df = pd.DataFrame(
1382713840
... {

pandas/core/indexes/base.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4168,7 +4168,7 @@ def reindex(
41684168
limit : int, optional
41694169
Maximum number of consecutive labels in ``target`` to match for
41704170
inexact matches.
4171-
tolerance : int or float, optional
4171+
tolerance : int, float, or list-like, optional
41724172
Maximum distance between original and new labels for inexact
41734173
matches. The values of the index at the matching locations must
41744174
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
@@ -5675,7 +5675,7 @@ def asof(self, label):
56755675
return self._na_value
56765676
else:
56775677
if isinstance(loc, slice):
5678-
loc = loc.indices(len(self))[-1]
5678+
return self[loc][-1]
56795679

56805680
return self[loc]
56815681

pandas/core/indexes/datetimes.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1133,12 +1133,14 @@ def bdate_range(
11331133
msg = "freq must be specified for bdate_range; use date_range instead"
11341134
raise TypeError(msg)
11351135

1136-
if isinstance(freq, str) and freq.startswith("C"):
1136+
if isinstance(freq, str) and freq.upper().startswith("C"):
1137+
msg = f"invalid custom frequency string: {freq}"
1138+
if freq == "CBH":
1139+
raise ValueError(f"{msg}, did you mean cbh?")
11371140
try:
11381141
weekmask = weekmask or "Mon Tue Wed Thu Fri"
11391142
freq = prefix_mapping[freq](holidays=holidays, weekmask=weekmask)
11401143
except (KeyError, TypeError) as err:
1141-
msg = f"invalid custom frequency string: {freq}"
11421144
raise ValueError(msg) from err
11431145
elif holidays or weekmask:
11441146
msg = (

0 commit comments

Comments
 (0)