Skip to content

Commit db804eb

Browse files
Merge branch 'pandas-dev:main' into remove_doc_from_masked
2 parents 5852593 + 77fdffd commit db804eb

File tree

14 files changed

+181
-64
lines changed

14 files changed

+181
-64
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ If you are simply looking to start working with the pandas codebase, navigate to
179179

180180
You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas).
181181

182-
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!
182+
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’... you can do something about it!
183183

184184
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Slack](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack).
185185

doc/source/user_guide/groupby.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
137137

138138
``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
139139

140-
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
140+
DataFrame groupby always operates along axis 0 (rows). To split by columns, first do
141141
a transpose:
142142

143143
.. ipython::

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -737,6 +737,7 @@ Other Deprecations
737737
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
738738
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
739739
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
740+
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
740741
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
741742
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
742743

@@ -974,13 +975,15 @@ Datetimelike
974975
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
975976
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
976977
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
978+
- Bug in :func:`infer_freq` with a :class:`Series` with :class:`ArrowDtype` timestamp dtype incorrectly raising ``TypeError`` (:issue:`58403`)
977979
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
978980
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
979981
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
980982
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
981983
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
982984
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
983985
- Bug in :meth:`DateOffset.rollback` (and subclass methods) with ``normalize=True`` rolling back one offset too long (:issue:`32616`)
986+
- Bug in :meth:`DatetimeIndex.asof` with a string key giving incorrect results (:issue:`50946`)
984987
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
985988
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
986989
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
@@ -1183,10 +1186,13 @@ Reshaping
11831186
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11841187
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11851188
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
1189+
- Bug in :meth:`DataFrame.combine_first` with non-unique columns incorrectly raising (:issue:`29135`)
1190+
- Bug in :meth:`DataFrame.combine` with non-unique columns incorrectly raising (:issue:`51340`)
11861191
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
11871192
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
11881193
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
11891194
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
1195+
- Bug in :meth:`Series.combine_first` incorrectly replacing ``None`` entries with ``NaN`` (:issue:`58977`)
11901196
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
11911197
- Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
11921198
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)

pandas/core/frame.py

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -9038,16 +9038,6 @@ def combine(
90389038
0 0 -5.0
90399039
1 0 4.0
90409040
9041-
However, if the same element in both dataframes is None, that None
9042-
is preserved
9043-
9044-
>>> df1 = pd.DataFrame({"A": [0, 0], "B": [None, 4]})
9045-
>>> df2 = pd.DataFrame({"A": [1, 1], "B": [None, 3]})
9046-
>>> df1.combine(df2, take_smaller, fill_value=-5)
9047-
A B
9048-
0 0 -5.0
9049-
1 0 3.0
9050-
90519041
Example that demonstrates the use of `overwrite` and behavior when
90529042
the axis differ between the dataframes.
90539043
@@ -9106,11 +9096,14 @@ def combine(
91069096

91079097
# preserve column order
91089098
new_columns = self.columns.union(other_columns, sort=False)
9099+
this = this.reindex(new_columns, axis=1)
9100+
other = other.reindex(new_columns, axis=1)
9101+
91099102
do_fill = fill_value is not None
91109103
result = {}
9111-
for col in new_columns:
9112-
series = this[col]
9113-
other_series = other[col]
9104+
for i in range(this.shape[1]):
9105+
series = this.iloc[:, i]
9106+
other_series = other.iloc[:, i]
91149107

91159108
this_dtype = series.dtype
91169109
other_dtype = other_series.dtype
@@ -9121,7 +9114,7 @@ def combine(
91219114
# don't overwrite columns unnecessarily
91229115
# DO propagate if this column is not in the intersection
91239116
if not overwrite and other_mask.all():
9124-
result[col] = this[col].copy()
9117+
result[i] = series.copy()
91259118
continue
91269119

91279120
if do_fill:
@@ -9130,7 +9123,7 @@ def combine(
91309123
series[this_mask] = fill_value
91319124
other_series[other_mask] = fill_value
91329125

9133-
if col not in self.columns:
9126+
if new_columns[i] not in self.columns:
91349127
# If self DataFrame does not have col in other DataFrame,
91359128
# try to promote series, which is all NaN, as other_dtype.
91369129
new_dtype = other_dtype
@@ -9155,10 +9148,10 @@ def combine(
91559148
arr, new_dtype
91569149
)
91579150

9158-
result[col] = arr
9151+
result[i] = arr
91599152

9160-
# convert_objects just in case
9161-
frame_result = self._constructor(result, index=new_index, columns=new_columns)
9153+
frame_result = self._constructor(result, index=new_index)
9154+
frame_result.columns = new_columns
91629155
return frame_result.__finalize__(self, method="combine")
91639156

91649157
def combine_first(self, other: DataFrame) -> DataFrame:
@@ -9222,9 +9215,14 @@ def combiner(x: Series, y: Series):
92229215
combined = self.combine(other, combiner, overwrite=False)
92239216

92249217
dtypes = {
9218+
# Check for isinstance(..., (np.dtype, ExtensionDtype))
9219+
# to prevent raising on non-unique columns see GH#29135.
9220+
# Note we will just not-cast in these cases.
92259221
col: find_common_type([self.dtypes[col], other.dtypes[col]])
92269222
for col in self.columns.intersection(other.columns)
9227-
if combined.dtypes[col] != self.dtypes[col]
9223+
if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))
9224+
and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))
9225+
and combined.dtypes[col] != self.dtypes[col]
92289226
}
92299227

92309228
if dtypes:
@@ -9432,7 +9430,7 @@ def groupby(
94329430
index. If a dict or Series is passed, the Series or dict VALUES
94339431
will be used to determine the groups (the Series' values are first
94349432
aligned; see ``.align()`` method). If a list or ndarray of length
9435-
equal to the selected axis is passed (see the `groupby user guide
9433+
equal to the number of rows is passed (see the `groupby user guide
94369434
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups>`_),
94379435
the values are used as-is to determine the groups. A label or list
94389436
of labels may be passed to group by the columns in ``self``.
@@ -13820,8 +13818,8 @@ def quantile(
1382013818
0.1 1 1
1382113819
0.5 3 100
1382213820
13823-
Specifying `numeric_only=False` will also compute the quantile of
13824-
datetime and timedelta data.
13821+
Specifying `numeric_only=False` will compute the quantiles for all
13822+
columns.
1382513823
1382613824
>>> df = pd.DataFrame(
1382713825
... {

pandas/core/indexes/base.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4168,7 +4168,7 @@ def reindex(
41684168
limit : int, optional
41694169
Maximum number of consecutive labels in ``target`` to match for
41704170
inexact matches.
4171-
tolerance : int or float, optional
4171+
tolerance : int, float, or list-like, optional
41724172
Maximum distance between original and new labels for inexact
41734173
matches. The values of the index at the matching locations must
41744174
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
@@ -5675,7 +5675,7 @@ def asof(self, label):
56755675
return self._na_value
56765676
else:
56775677
if isinstance(loc, slice):
5678-
loc = loc.indices(len(self))[-1]
5678+
return self[loc][-1]
56795679

56805680
return self[loc]
56815681

pandas/core/series.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,6 @@
8787
)
8888
from pandas.core.dtypes.dtypes import (
8989
ExtensionDtype,
90-
SparseDtype,
9190
)
9291
from pandas.core.dtypes.generic import (
9392
ABCDataFrame,
@@ -3112,8 +3111,8 @@ def combine(
31123111
31133112
Combine the Series and `other` using `func` to perform elementwise
31143113
selection for combined Series.
3115-
`fill_value` is assumed when value is missing at some index
3116-
from one of the two objects being combined.
3114+
`fill_value` is assumed when value is not present at some index
3115+
from one of the two Series being combined.
31173116
31183117
Parameters
31193118
----------
@@ -3254,9 +3253,6 @@ def combine_first(self, other) -> Series:
32543253
if self.dtype == other.dtype:
32553254
if self.index.equals(other.index):
32563255
return self.mask(self.isna(), other)
3257-
elif self._can_hold_na and not isinstance(self.dtype, SparseDtype):
3258-
this, other = self.align(other, join="outer")
3259-
return this.mask(this.isna(), other)
32603256

32613257
new_index = self.index.union(other.index)
32623258

@@ -3271,6 +3267,16 @@ def combine_first(self, other) -> Series:
32713267
if this.dtype.kind == "M" and other.dtype.kind != "M":
32723268
# TODO: try to match resos?
32733269
other = to_datetime(other)
3270+
warnings.warn(
3271+
# GH#62931
3272+
"Silently casting non-datetime 'other' to datetime in "
3273+
"Series.combine_first is deprecated and will be removed "
3274+
"in a future version. Explicitly cast before calling "
3275+
"combine_first instead.",
3276+
Pandas4Warning,
3277+
stacklevel=find_stack_level(),
3278+
)
3279+
32743280
combined = concat([this, other])
32753281
combined = combined.reindex(new_index)
32763282
return combined.__finalize__(self, method="combine_first")

pandas/tests/frame/methods/test_combine.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,19 @@ def test_combine_generic(self, float_frame):
4545
)
4646
tm.assert_frame_equal(chunk, exp)
4747
tm.assert_frame_equal(chunk2, exp)
48+
49+
def test_combine_nonunique_columns(self):
50+
# GH#51340
51+
52+
df = pd.DataFrame({"A": range(5), "B": range(5)})
53+
df.columns = ["A", "A"]
54+
55+
other = df.copy()
56+
df.iloc[1, :] = None
57+
58+
def combiner(a, b):
59+
return b
60+
61+
result = df.combine(other, combiner)
62+
expected = other.astype("float64")
63+
tm.assert_frame_equal(result, expected)

pandas/tests/frame/methods/test_combine_first.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,18 @@ def test_combine_first_preserve_EA_precision(self, wide_val, dtype):
413413
expected = DataFrame({"A": [wide_val, 5, wide_val]}, dtype=dtype)
414414
tm.assert_frame_equal(result, expected)
415415

416+
def test_combine_first_non_unique_columns(self):
417+
# GH#29135
418+
df1 = DataFrame([[1, np.nan], [3, 4]], columns=["P", "Q"], index=["A", "B"])
419+
df2 = DataFrame(
420+
[[5, 6, 7], [8, 9, np.nan]], columns=["P", "Q", "Q"], index=["A", "B"]
421+
)
422+
result = df1.combine_first(df2)
423+
expected = DataFrame(
424+
[[1, 6.0, 7.0], [3, 4.0, 4.0]], index=["A", "B"], columns=["P", "Q", "Q"]
425+
)
426+
tm.assert_frame_equal(result, expected)
427+
416428

417429
@pytest.mark.parametrize(
418430
"scalar1, scalar2",

pandas/tests/frame/methods/test_join.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -575,3 +575,27 @@ def test_frame_join_tzaware(self):
575575

576576
tm.assert_index_equal(result.index, expected)
577577
assert result.index.tz.key == "US/Central"
578+
579+
def test_frame_join_categorical_index(self):
580+
# GH 61675
581+
cat_data = pd.Categorical(
582+
[3, 4],
583+
categories=pd.Series([2, 3, 4, 5], dtype="Int64"),
584+
ordered=True,
585+
)
586+
values1 = "a b".split()
587+
values2 = "foo bar".split()
588+
df1 = DataFrame({"hr": cat_data, "values1": values1}).set_index("hr")
589+
df2 = DataFrame({"hr": cat_data, "values2": values2}).set_index("hr")
590+
df1.columns = pd.CategoricalIndex([4], dtype=cat_data.dtype, name="other_hr")
591+
df2.columns = pd.CategoricalIndex([3], dtype=cat_data.dtype, name="other_hr")
592+
593+
df_joined = df1.join(df2)
594+
expected = DataFrame(
595+
{"hr": cat_data, "values1": values1, "values2": values2}
596+
).set_index("hr")
597+
expected.columns = pd.CategoricalIndex(
598+
[4, 3], dtype=cat_data.dtype, name="other_hr"
599+
)
600+
601+
tm.assert_frame_equal(df_joined, expected)

pandas/tests/indexes/datetimes/methods/test_asof.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from datetime import timedelta
22

33
from pandas import (
4+
DatetimeIndex,
45
Index,
56
Timestamp,
67
date_range,
@@ -28,3 +29,18 @@ def test_asof(self):
2829

2930
dt = index[0].to_pydatetime()
3031
assert isinstance(index.asof(dt), Timestamp)
32+
33+
def test_asof_datetime_string(self):
34+
# GH#50946
35+
36+
dti = date_range("2021-08-05", "2021-08-10", freq="1D")
37+
38+
key = "2021-08-09"
39+
res = dti.asof(key)
40+
exp = dti[4]
41+
assert res == exp
42+
43+
# add a non-midnight time caused a bug
44+
dti2 = DatetimeIndex(list(dti) + ["2021-08-11 00:00:01"])
45+
res = dti2.asof(key)
46+
assert res == exp

0 commit comments

Comments
 (0)