Skip to content

Commit 4034f07

Browse files
authored
Merge branch 'main' into update-docs-data-table-representation
2 parents 3b084a1 + 82fa271 commit 4034f07

File tree

22 files changed

+357
-159
lines changed

22 files changed

+357
-159
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -737,6 +737,7 @@ Other Deprecations
737737
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
738738
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
739739
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
740+
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
740741
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
741742
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
742743

@@ -974,12 +975,15 @@ Datetimelike
974975
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
975976
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
976977
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
978+
- Bug in :func:`infer_freq` with a :class:`Series` with :class:`ArrowDtype` timestamp dtype incorrectly raising ``TypeError`` (:issue:`58403`)
977979
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
978980
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
979981
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
980982
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
981983
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
982984
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
985+
- Bug in :meth:`DateOffset.rollback` (and subclass methods) with ``normalize=True`` rolling back one offset too long (:issue:`32616`)
986+
- Bug in :meth:`DatetimeIndex.asof` with a string key giving incorrect results (:issue:`50946`)
983987
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
984988
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
985989
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
@@ -998,6 +1002,7 @@ Datetimelike
9981002
- Bug in comparison between objects with pyarrow date dtype and ``timestamp[pyarrow]`` or ``np.datetime64`` dtype failing to consider these as non-comparable (:issue:`62157`)
9991003
- Bug in constructing arrays with :class:`ArrowDtype` with ``timestamp`` type incorrectly allowing ``Decimal("NaN")`` (:issue:`61773`)
10001004
- Bug in constructing arrays with a timezone-aware :class:`ArrowDtype` from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like :class:`DatetimeTZDtype` (:issue:`61775`)
1005+
- Bug in retaining frequency in :meth:`value_counts` specifically for :meth:`DatetimeIndex` and :meth:`TimedeltaIndex` (:issue:`33830`)
10011006
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
10021007

10031008
Timedelta
@@ -1049,6 +1054,7 @@ Interval
10491054
- Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
10501055
- Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
10511056
- Bug in :meth:`IntervalIndex.get_indexer` and :meth:`IntervalIndex.drop` when one of the sides of the index is non-unique (:issue:`52245`)
1057+
- Construction of :class:`IntervalArray` and :class:`IntervalIndex` from arrays with mismatched signed/unsigned integer dtypes (e.g., ``int64`` and ``uint64``) now raises a :exc:`TypeError` instead of proceeding silently. (:issue:`55715`)
10521058

10531059
Indexing
10541060
^^^^^^^^
@@ -1180,10 +1186,13 @@ Reshaping
11801186
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11811187
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11821188
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
1189+
- Bug in :meth:`DataFrame.combine_first` with non-unique columns incorrectly raising (:issue:`29135`)
1190+
- Bug in :meth:`DataFrame.combine` with non-unique columns incorrectly raising (:issue:`51340`)
11831191
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
11841192
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
11851193
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
11861194
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
1195+
- Bug in :meth:`Series.combine_first` incorrectly replacing ``None`` entries with ``NaN`` (:issue:`58977`)
11871196
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
11881197
- Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
11891198
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)

pandas/_libs/tslibs/offsets.pyx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -692,6 +692,9 @@ cdef class BaseOffset:
692692
Rolled timestamp if not on offset, otherwise unchanged timestamp.
693693
"""
694694
dt = Timestamp(dt)
695+
if self.normalize and (dt - dt.normalize())._value != 0:
696+
# GH#32616
697+
dt = dt.normalize()
695698
if not self.is_on_offset(dt):
696699
dt = dt - type(self)(1, normalize=self.normalize, **self.kwds)
697700
return dt

pandas/core/algorithms.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -868,8 +868,10 @@ def value_counts_internal(
868868
dropna: bool = True,
869869
) -> Series:
870870
from pandas import (
871+
DatetimeIndex,
871872
Index,
872873
Series,
874+
TimedeltaIndex,
873875
)
874876

875877
index_name = getattr(values, "name", None)
@@ -934,6 +936,17 @@ def value_counts_internal(
934936
# Starting in 3.0, we no longer perform dtype inference on the
935937
# Index object we construct here, xref GH#56161
936938
idx = Index(keys, dtype=keys.dtype, name=index_name)
939+
940+
if (
941+
bins is None
942+
and not sort
943+
and isinstance(values, (DatetimeIndex, TimedeltaIndex))
944+
and idx.equals(values)
945+
and values.inferred_freq is not None
946+
):
947+
# Preserve freq of original index
948+
idx.freq = values.inferred_freq # type: ignore[attr-defined]
949+
937950
result = Series(counts, index=idx, name=name, copy=False)
938951

939952
if sort:

pandas/core/arrays/arrow/array.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -829,6 +829,14 @@ def __arrow_array__(self, type=None):
829829
"""Convert myself to a pyarrow ChunkedArray."""
830830
return self._pa_array
831831

832+
def __array_ufunc__(self, ufunc: np.ufunc, method: str, *inputs, **kwargs):
833+
# Need to wrap np.array results GH#62800
834+
result = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
835+
if type(self) is ArrowExtensionArray:
836+
# Exclude ArrowStringArray
837+
return type(self)._from_sequence(result)
838+
return result
839+
832840
def __array__(
833841
self, dtype: NpDtype | None = None, copy: bool | None = None
834842
) -> np.ndarray:

pandas/core/arrays/interval.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,6 +420,18 @@ def _ensure_simple_new_inputs(
420420

421421
dtype = IntervalDtype(left.dtype, closed=closed)
422422

423+
# Check for mismatched signed/unsigned integer dtypes after casting
424+
left_dtype = left.dtype
425+
right_dtype = right.dtype
426+
if (
427+
left_dtype.kind in "iu"
428+
and right_dtype.kind in "iu"
429+
and left_dtype.kind != right_dtype.kind
430+
):
431+
raise TypeError(
432+
f"Left and right arrays must have matching signedness. "
433+
f"Got {left_dtype} and {right_dtype}."
434+
)
423435
return left, right, dtype
424436

425437
@classmethod

pandas/core/frame.py

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -9038,16 +9038,6 @@ def combine(
90389038
0 0 -5.0
90399039
1 0 4.0
90409040
9041-
However, if the same element in both dataframes is None, that None
9042-
is preserved
9043-
9044-
>>> df1 = pd.DataFrame({"A": [0, 0], "B": [None, 4]})
9045-
>>> df2 = pd.DataFrame({"A": [1, 1], "B": [None, 3]})
9046-
>>> df1.combine(df2, take_smaller, fill_value=-5)
9047-
A B
9048-
0 0 -5.0
9049-
1 0 3.0
9050-
90519041
Example that demonstrates the use of `overwrite` and behavior when
90529042
the axis differ between the dataframes.
90539043
@@ -9106,11 +9096,14 @@ def combine(
91069096

91079097
# preserve column order
91089098
new_columns = self.columns.union(other_columns, sort=False)
9099+
this = this.reindex(new_columns, axis=1)
9100+
other = other.reindex(new_columns, axis=1)
9101+
91099102
do_fill = fill_value is not None
91109103
result = {}
9111-
for col in new_columns:
9112-
series = this[col]
9113-
other_series = other[col]
9104+
for i in range(this.shape[1]):
9105+
series = this.iloc[:, i]
9106+
other_series = other.iloc[:, i]
91149107

91159108
this_dtype = series.dtype
91169109
other_dtype = other_series.dtype
@@ -9121,7 +9114,7 @@ def combine(
91219114
# don't overwrite columns unnecessarily
91229115
# DO propagate if this column is not in the intersection
91239116
if not overwrite and other_mask.all():
9124-
result[col] = this[col].copy()
9117+
result[i] = series.copy()
91259118
continue
91269119

91279120
if do_fill:
@@ -9130,7 +9123,7 @@ def combine(
91309123
series[this_mask] = fill_value
91319124
other_series[other_mask] = fill_value
91329125

9133-
if col not in self.columns:
9126+
if new_columns[i] not in self.columns:
91349127
# If self DataFrame does not have col in other DataFrame,
91359128
# try to promote series, which is all NaN, as other_dtype.
91369129
new_dtype = other_dtype
@@ -9155,10 +9148,10 @@ def combine(
91559148
arr, new_dtype
91569149
)
91579150

9158-
result[col] = arr
9151+
result[i] = arr
91599152

9160-
# convert_objects just in case
9161-
frame_result = self._constructor(result, index=new_index, columns=new_columns)
9153+
frame_result = self._constructor(result, index=new_index)
9154+
frame_result.columns = new_columns
91629155
return frame_result.__finalize__(self, method="combine")
91639156

91649157
def combine_first(self, other: DataFrame) -> DataFrame:
@@ -9222,9 +9215,14 @@ def combiner(x: Series, y: Series):
92229215
combined = self.combine(other, combiner, overwrite=False)
92239216

92249217
dtypes = {
9218+
# Check for isinstance(..., (np.dtype, ExtensionDtype))
9219+
# to prevent raising on non-unique columns see GH#29135.
9220+
# Note we will just not-cast in these cases.
92259221
col: find_common_type([self.dtypes[col], other.dtypes[col]])
92269222
for col in self.columns.intersection(other.columns)
9227-
if combined.dtypes[col] != self.dtypes[col]
9223+
if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))
9224+
and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))
9225+
and combined.dtypes[col] != self.dtypes[col]
92289226
}
92299227

92309228
if dtypes:
@@ -13820,8 +13818,8 @@ def quantile(
1382013818
0.1 1 1
1382113819
0.5 3 100
1382213820
13823-
Specifying `numeric_only=False` will also compute the quantile of
13824-
datetime and timedelta data.
13821+
Specifying `numeric_only=False` will compute the quantiles for all
13822+
columns.
1382513823
1382613824
>>> df = pd.DataFrame(
1382713825
... {

pandas/core/indexes/base.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4168,7 +4168,7 @@ def reindex(
41684168
limit : int, optional
41694169
Maximum number of consecutive labels in ``target`` to match for
41704170
inexact matches.
4171-
tolerance : int or float, optional
4171+
tolerance : int, float, or list-like, optional
41724172
Maximum distance between original and new labels for inexact
41734173
matches. The values of the index at the matching locations must
41744174
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
@@ -5675,7 +5675,7 @@ def asof(self, label):
56755675
return self._na_value
56765676
else:
56775677
if isinstance(loc, slice):
5678-
loc = loc.indices(len(self))[-1]
5678+
return self[loc][-1]
56795679

56805680
return self[loc]
56815681

pandas/core/series.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,6 @@
8787
)
8888
from pandas.core.dtypes.dtypes import (
8989
ExtensionDtype,
90-
SparseDtype,
9190
)
9291
from pandas.core.dtypes.generic import (
9392
ABCDataFrame,
@@ -3112,8 +3111,8 @@ def combine(
31123111
31133112
Combine the Series and `other` using `func` to perform elementwise
31143113
selection for combined Series.
3115-
`fill_value` is assumed when value is missing at some index
3116-
from one of the two objects being combined.
3114+
`fill_value` is assumed when value is not present at some index
3115+
from one of the two Series being combined.
31173116
31183117
Parameters
31193118
----------
@@ -3254,9 +3253,6 @@ def combine_first(self, other) -> Series:
32543253
if self.dtype == other.dtype:
32553254
if self.index.equals(other.index):
32563255
return self.mask(self.isna(), other)
3257-
elif self._can_hold_na and not isinstance(self.dtype, SparseDtype):
3258-
this, other = self.align(other, join="outer")
3259-
return this.mask(this.isna(), other)
32603256

32613257
new_index = self.index.union(other.index)
32623258

@@ -3271,6 +3267,16 @@ def combine_first(self, other) -> Series:
32713267
if this.dtype.kind == "M" and other.dtype.kind != "M":
32723268
# TODO: try to match resos?
32733269
other = to_datetime(other)
3270+
warnings.warn(
3271+
# GH#62931
3272+
"Silently casting non-datetime 'other' to datetime in "
3273+
"Series.combine_first is deprecated and will be removed "
3274+
"in a future version. Explicitly cast before calling "
3275+
"combine_first instead.",
3276+
Pandas4Warning,
3277+
stacklevel=find_stack_level(),
3278+
)
3279+
32743280
combined = concat([this, other])
32753281
combined = combined.reindex(new_index)
32763282
return combined.__finalize__(self, method="combine_first")

pandas/tests/base/test_value_counts.py

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
Series,
1515
Timedelta,
1616
TimedeltaIndex,
17+
Timestamp,
1718
array,
1819
)
1920
import pandas._testing as tm
@@ -339,3 +340,81 @@ def test_value_counts_object_inference_deprecated():
339340
exp = dti.value_counts()
340341
exp.index = exp.index.astype(object)
341342
tm.assert_series_equal(res, exp)
343+
344+
345+
@pytest.mark.parametrize(
346+
("index", "expected_index"),
347+
[
348+
[
349+
pd.date_range("2016-01-01", periods=5, freq="D"),
350+
pd.date_range("2016-01-01", periods=5, freq="D"),
351+
],
352+
[
353+
pd.timedelta_range(Timedelta(0), periods=5, freq="h"),
354+
pd.timedelta_range(Timedelta(0), periods=5, freq="h"),
355+
],
356+
[
357+
DatetimeIndex(
358+
[Timestamp("2016-01-01") + Timedelta(days=i) for i in range(1)]
359+
+ [Timestamp("2016-01-02")]
360+
+ [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(1, 5)]
361+
),
362+
DatetimeIndex(pd.date_range("2016-01-01", periods=5, freq="D")),
363+
],
364+
[
365+
TimedeltaIndex(
366+
[Timedelta(hours=i) for i in range(1)]
367+
+ [Timedelta(hours=1)]
368+
+ [Timedelta(hours=i) for i in range(1, 5)],
369+
),
370+
TimedeltaIndex(pd.timedelta_range(Timedelta(0), periods=5, freq="h")),
371+
],
372+
[
373+
DatetimeIndex(
374+
[Timestamp("2016-01-01") + Timedelta(days=i) for i in range(2)]
375+
+ [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(3, 5)],
376+
),
377+
DatetimeIndex(
378+
[Timestamp("2016-01-01") + Timedelta(days=i) for i in range(2)]
379+
+ [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(3, 5)],
380+
),
381+
],
382+
[
383+
TimedeltaIndex(
384+
[Timedelta(hours=i) for i in range(2)]
385+
+ [Timedelta(hours=i) for i in range(3, 5)],
386+
),
387+
TimedeltaIndex(
388+
[Timedelta(hours=i) for i in range(2)]
389+
+ [Timedelta(hours=i) for i in range(3, 5)],
390+
),
391+
],
392+
[
393+
DatetimeIndex(
394+
[Timestamp("2016-01-01")]
395+
+ [pd.NaT]
396+
+ [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(1, 5)],
397+
),
398+
DatetimeIndex(
399+
[Timestamp("2016-01-01")]
400+
+ [pd.NaT]
401+
+ [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(1, 5)],
402+
),
403+
],
404+
[
405+
TimedeltaIndex(
406+
[Timedelta(hours=0)]
407+
+ [pd.NaT]
408+
+ [Timedelta(hours=i) for i in range(1, 5)],
409+
),
410+
TimedeltaIndex(
411+
[Timedelta(hours=0)]
412+
+ [pd.NaT]
413+
+ [Timedelta(hours=i) for i in range(1, 5)],
414+
),
415+
],
416+
],
417+
)
418+
def test_value_counts_index_datetimelike(index, expected_index):
419+
vc = index.value_counts(sort=False, dropna=False)
420+
tm.assert_index_equal(vc.index, expected_index)

pandas/tests/extension/test_arrow.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3800,3 +3800,13 @@ def test_cast_pontwise_result_decimal_nan():
38003800

38013801
pa_type = result.dtype.pyarrow_dtype
38023802
assert pa.types.is_decimal(pa_type)
3803+
3804+
3805+
def test_ufunc_retains_missing():
3806+
# GH#62800
3807+
ser = pd.Series([0.1, pd.NA], dtype="float64[pyarrow]")
3808+
3809+
result = np.sin(ser)
3810+
3811+
expected = pd.Series([np.sin(0.1), pd.NA], dtype="float64[pyarrow]")
3812+
tm.assert_series_equal(result, expected)

0 commit comments

Comments
 (0)