Skip to content

Commit 9c128d9

Browse files
committed
Merge main
2 parents f713c4f + c3bace8 commit 9c128d9

File tree

11 files changed

+124
-31
lines changed

11 files changed

+124
-31
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ If you are simply looking to start working with the pandas codebase, navigate to
179179

180180
You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas).
181181

182-
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!
182+
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’... you can do something about it!
183183

184184
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Slack](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack).
185185

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -794,6 +794,7 @@ Other Deprecations
794794
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
795795
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
796796
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
797+
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
797798
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
798799
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
799800

@@ -1031,13 +1032,15 @@ Datetimelike
10311032
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
10321033
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
10331034
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
1035+
- Bug in :func:`infer_freq` with a :class:`Series` with :class:`ArrowDtype` timestamp dtype incorrectly raising ``TypeError`` (:issue:`58403`)
10341036
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
10351037
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
10361038
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
10371039
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
10381040
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
10391041
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
10401042
- Bug in :meth:`DateOffset.rollback` (and subclass methods) with ``normalize=True`` rolling back one offset too long (:issue:`32616`)
1043+
- Bug in :meth:`DatetimeIndex.asof` with a string key giving incorrect results (:issue:`50946`)
10411044
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
10421045
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
10431046
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
@@ -1241,10 +1244,13 @@ Reshaping
12411244
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
12421245
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
12431246
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
1247+
- Bug in :meth:`DataFrame.combine_first` with non-unique columns incorrectly raising (:issue:`29135`)
1248+
- Bug in :meth:`DataFrame.combine` with non-unique columns incorrectly raising (:issue:`51340`)
12441249
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
12451250
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
12461251
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
12471252
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
1253+
- Bug in :meth:`Series.combine_first` incorrectly replacing ``None`` entries with ``NaN`` (:issue:`58977`)
12481254
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
12491255
- Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
12501256
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)

pandas/core/frame.py

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -9038,16 +9038,6 @@ def combine(
90389038
0 0 -5.0
90399039
1 0 4.0
90409040
9041-
However, if the same element in both dataframes is None, that None
9042-
is preserved
9043-
9044-
>>> df1 = pd.DataFrame({"A": [0, 0], "B": [None, 4]})
9045-
>>> df2 = pd.DataFrame({"A": [1, 1], "B": [None, 3]})
9046-
>>> df1.combine(df2, take_smaller, fill_value=-5)
9047-
A B
9048-
0 0 -5.0
9049-
1 0 3.0
9050-
90519041
Example that demonstrates the use of `overwrite` and behavior when
90529042
the axis differ between the dataframes.
90539043
@@ -9106,11 +9096,14 @@ def combine(
91069096

91079097
# preserve column order
91089098
new_columns = self.columns.union(other_columns, sort=False)
9099+
this = this.reindex(new_columns, axis=1)
9100+
other = other.reindex(new_columns, axis=1)
9101+
91099102
do_fill = fill_value is not None
91109103
result = {}
9111-
for col in new_columns:
9112-
series = this[col]
9113-
other_series = other[col]
9104+
for i in range(this.shape[1]):
9105+
series = this.iloc[:, i]
9106+
other_series = other.iloc[:, i]
91149107

91159108
this_dtype = series.dtype
91169109
other_dtype = other_series.dtype
@@ -9121,7 +9114,7 @@ def combine(
91219114
# don't overwrite columns unnecessarily
91229115
# DO propagate if this column is not in the intersection
91239116
if not overwrite and other_mask.all():
9124-
result[col] = this[col].copy()
9117+
result[i] = series.copy()
91259118
continue
91269119

91279120
if do_fill:
@@ -9130,7 +9123,7 @@ def combine(
91309123
series[this_mask] = fill_value
91319124
other_series[other_mask] = fill_value
91329125

9133-
if col not in self.columns:
9126+
if new_columns[i] not in self.columns:
91349127
# If self DataFrame does not have col in other DataFrame,
91359128
# try to promote series, which is all NaN, as other_dtype.
91369129
new_dtype = other_dtype
@@ -9155,10 +9148,10 @@ def combine(
91559148
arr, new_dtype
91569149
)
91579150

9158-
result[col] = arr
9151+
result[i] = arr
91599152

9160-
# convert_objects just in case
9161-
frame_result = self._constructor(result, index=new_index, columns=new_columns)
9153+
frame_result = self._constructor(result, index=new_index)
9154+
frame_result.columns = new_columns
91629155
return frame_result.__finalize__(self, method="combine")
91639156

91649157
def combine_first(self, other: DataFrame) -> DataFrame:
@@ -9222,9 +9215,14 @@ def combiner(x: Series, y: Series):
92229215
combined = self.combine(other, combiner, overwrite=False)
92239216

92249217
dtypes = {
9218+
# Check for isinstance(..., (np.dtype, ExtensionDtype))
9219+
# to prevent raising on non-unique columns see GH#29135.
9220+
# Note we will just not-cast in these cases.
92259221
col: find_common_type([self.dtypes[col], other.dtypes[col]])
92269222
for col in self.columns.intersection(other.columns)
9227-
if combined.dtypes[col] != self.dtypes[col]
9223+
if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))
9224+
and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))
9225+
and combined.dtypes[col] != self.dtypes[col]
92289226
}
92299227

92309228
if dtypes:
@@ -13822,8 +13820,8 @@ def quantile(
1382213820
0.1 1 1
1382313821
0.5 3 100
1382413822
13825-
Specifying `numeric_only=False` will also compute the quantile of
13826-
datetime and timedelta data.
13823+
Specifying `numeric_only=False` will compute the quantiles for all
13824+
columns.
1382713825
1382813826
>>> df = pd.DataFrame(
1382913827
... {

pandas/core/indexes/base.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4168,7 +4168,7 @@ def reindex(
41684168
limit : int, optional
41694169
Maximum number of consecutive labels in ``target`` to match for
41704170
inexact matches.
4171-
tolerance : int or float, optional
4171+
tolerance : int, float, or list-like, optional
41724172
Maximum distance between original and new labels for inexact
41734173
matches. The values of the index at the matching locations must
41744174
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
@@ -5675,7 +5675,7 @@ def asof(self, label):
56755675
return self._na_value
56765676
else:
56775677
if isinstance(loc, slice):
5678-
loc = loc.indices(len(self))[-1]
5678+
return self[loc][-1]
56795679

56805680
return self[loc]
56815681

pandas/core/series.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,6 @@
8787
)
8888
from pandas.core.dtypes.dtypes import (
8989
ExtensionDtype,
90-
SparseDtype,
9190
)
9291
from pandas.core.dtypes.generic import (
9392
ABCDataFrame,
@@ -3112,8 +3111,8 @@ def combine(
31123111
31133112
Combine the Series and `other` using `func` to perform elementwise
31143113
selection for combined Series.
3115-
`fill_value` is assumed when value is missing at some index
3116-
from one of the two objects being combined.
3114+
`fill_value` is assumed when value is not present at some index
3115+
from one of the two Series being combined.
31173116
31183117
Parameters
31193118
----------
@@ -3254,9 +3253,6 @@ def combine_first(self, other) -> Series:
32543253
if self.dtype == other.dtype:
32553254
if self.index.equals(other.index):
32563255
return self.mask(self.isna(), other)
3257-
elif self._can_hold_na and not isinstance(self.dtype, SparseDtype):
3258-
this, other = self.align(other, join="outer")
3259-
return this.mask(this.isna(), other)
32603256

32613257
new_index = self.index.union(other.index)
32623258

@@ -3271,6 +3267,16 @@ def combine_first(self, other) -> Series:
32713267
if this.dtype.kind == "M" and other.dtype.kind != "M":
32723268
# TODO: try to match resos?
32733269
other = to_datetime(other)
3270+
warnings.warn(
3271+
# GH#62931
3272+
"Silently casting non-datetime 'other' to datetime in "
3273+
"Series.combine_first is deprecated and will be removed "
3274+
"in a future version. Explicitly cast before calling "
3275+
"combine_first instead.",
3276+
Pandas4Warning,
3277+
stacklevel=find_stack_level(),
3278+
)
3279+
32743280
combined = concat([this, other]) # nobug
32753281
combined = combined.reindex(new_index)
32763282
return combined.__finalize__(self, method="combine_first")

pandas/tests/frame/methods/test_combine.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,19 @@ def test_combine_generic(self, float_frame):
4545
)
4646
tm.assert_frame_equal(chunk, exp)
4747
tm.assert_frame_equal(chunk2, exp)
48+
49+
def test_combine_nonunique_columns(self):
50+
# GH#51340
51+
52+
df = pd.DataFrame({"A": range(5), "B": range(5)})
53+
df.columns = ["A", "A"]
54+
55+
other = df.copy()
56+
df.iloc[1, :] = None
57+
58+
def combiner(a, b):
59+
return b
60+
61+
result = df.combine(other, combiner)
62+
expected = other.astype("float64")
63+
tm.assert_frame_equal(result, expected)

pandas/tests/frame/methods/test_combine_first.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,18 @@ def test_combine_first_preserve_EA_precision(self, wide_val, dtype):
413413
expected = DataFrame({"A": [wide_val, 5, wide_val]}, dtype=dtype)
414414
tm.assert_frame_equal(result, expected)
415415

416+
def test_combine_first_non_unique_columns(self):
417+
# GH#29135
418+
df1 = DataFrame([[1, np.nan], [3, 4]], columns=["P", "Q"], index=["A", "B"])
419+
df2 = DataFrame(
420+
[[5, 6, 7], [8, 9, np.nan]], columns=["P", "Q", "Q"], index=["A", "B"]
421+
)
422+
result = df1.combine_first(df2)
423+
expected = DataFrame(
424+
[[1, 6.0, 7.0], [3, 4.0, 4.0]], index=["A", "B"], columns=["P", "Q", "Q"]
425+
)
426+
tm.assert_frame_equal(result, expected)
427+
416428

417429
@pytest.mark.parametrize(
418430
"scalar1, scalar2",

pandas/tests/indexes/datetimes/methods/test_asof.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from datetime import timedelta
22

33
from pandas import (
4+
DatetimeIndex,
45
Index,
56
Timestamp,
67
date_range,
@@ -28,3 +29,18 @@ def test_asof(self):
2829

2930
dt = index[0].to_pydatetime()
3031
assert isinstance(index.asof(dt), Timestamp)
32+
33+
def test_asof_datetime_string(self):
34+
# GH#50946
35+
36+
dti = date_range("2021-08-05", "2021-08-10", freq="1D")
37+
38+
key = "2021-08-09"
39+
res = dti.asof(key)
40+
exp = dti[4]
41+
assert res == exp
42+
43+
# add a non-midnight time caused a bug
44+
dti2 = DatetimeIndex(list(dti) + ["2021-08-11 00:00:01"])
45+
res = dti2.asof(key)
46+
assert res == exp

pandas/tests/series/methods/test_combine_first.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
import numpy as np
44

5+
from pandas.errors import Pandas4Warning
6+
57
import pandas as pd
68
from pandas import (
79
Period,
@@ -75,9 +77,14 @@ def test_combine_first_dt64(self, unit):
7577
xp = to_datetime(Series(["2010", "2011"])).dt.as_unit(unit)
7678
tm.assert_series_equal(rs, xp)
7779

80+
def test_combine_first_dt64_casting_deprecation(self, unit):
81+
# GH#62931
7882
s0 = to_datetime(Series(["2010", np.nan])).dt.as_unit(unit)
7983
s1 = Series([np.nan, "2011"])
80-
rs = s0.combine_first(s1)
84+
85+
msg = "Silently casting non-datetime 'other' to datetime"
86+
with tm.assert_produces_warning(Pandas4Warning, match=msg):
87+
rs = s0.combine_first(s1)
8188

8289
xp = Series([datetime(2010, 1, 1), "2011"], dtype=f"datetime64[{unit}]")
8390

@@ -144,3 +151,12 @@ def test_combine_mixed_timezone(self):
144151
),
145152
)
146153
tm.assert_series_equal(result, expected)
154+
155+
def test_combine_first_none_not_nan(self):
156+
# GH#58977
157+
s1 = Series([None, None, None], index=["a", "b", "c"])
158+
s2 = Series([None, None, None], index=["b", "c", "d"])
159+
160+
result = s1.combine_first(s2)
161+
expected = Series([None] * 4, index=["a", "b", "c", "d"])
162+
tm.assert_series_equal(result, expected)

pandas/tests/tseries/frequencies/test_inference.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from pandas._libs.tslibs.offsets import _get_offset
1414
from pandas._libs.tslibs.period import INVALID_FREQ_ERR_MSG
1515
from pandas.compat import is_platform_windows
16+
import pandas.util._test_decorators as td
1617

1718
from pandas import (
1819
DatetimeIndex,
@@ -542,3 +543,16 @@ def test_infer_freq_non_nano_tzaware(tz_aware_fixture):
542543

543544
res = frequencies.infer_freq(dta)
544545
assert res == "B"
546+
547+
548+
@td.skip_if_no("pyarrow")
549+
def test_infer_freq_pyarrow():
550+
# GH#58403
551+
data = ["2022-01-01T10:00:00", "2022-01-01T10:00:30", "2022-01-01T10:01:00"]
552+
pd_series = Series(data).astype("timestamp[s][pyarrow]")
553+
pd_index = Index(data).astype("timestamp[s][pyarrow]")
554+
555+
assert frequencies.infer_freq(pd_index.values) == "30s"
556+
assert frequencies.infer_freq(pd_series.values) == "30s"
557+
assert frequencies.infer_freq(pd_index) == "30s"
558+
assert frequencies.infer_freq(pd_series) == "30s"

0 commit comments

Comments
 (0)