pandas-dev
diff --git a/‎doc/source/whatsnew/v3.0.0.rst‎
Lines changed: 9 additions & 0 deletions b/‎doc/source/whatsnew/v3.0.0.rst‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎pandas/_libs/tslibs/offsets.pyx‎
Lines changed: 3 additions & 0 deletions b/‎pandas/_libs/tslibs/offsets.pyx‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎pandas/core/algorithms.py‎
Lines changed: 13 additions & 0 deletions b/‎pandas/core/algorithms.py‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎pandas/core/arrays/arrow/array.py‎
Lines changed: 8 additions & 0 deletions b/‎pandas/core/arrays/arrow/array.py‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎pandas/core/arrays/interval.py‎
Lines changed: 12 additions & 0 deletions b/‎pandas/core/arrays/interval.py‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎pandas/core/frame.py‎
Lines changed: 19 additions & 21 deletions b/‎pandas/core/frame.py‎
Lines changed: 19 additions & 21 deletions
diff --git a/‎pandas/core/indexes/base.py‎
Lines changed: 2 additions & 2 deletions b/‎pandas/core/indexes/base.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pandas/core/series.py‎
Lines changed: 12 additions & 6 deletions b/‎pandas/core/series.py‎
Lines changed: 12 additions & 6 deletions
diff --git a/‎pandas/tests/base/test_value_counts.py‎
Lines changed: 79 additions & 0 deletions b/‎pandas/tests/base/test_value_counts.py‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎pandas/tests/extension/test_arrow.py‎
Lines changed: 10 additions & 0 deletions b/‎pandas/tests/extension/test_arrow.py‎
Lines changed: 10 additions & 0 deletions
@@ -737,6 +737,7 @@ Other Deprecations
 - Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
 - Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
 - Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
+- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
 - Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
 - Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
 
@@ -974,12 +975,15 @@ Datetimelike
 - Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
 - Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
 - Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
+- Bug in :func:`infer_freq` with a :class:`Series` with :class:`ArrowDtype` timestamp dtype incorrectly raising ``TypeError`` (:issue:`58403`)
 - Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised  ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
 - Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
 - Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
 - Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
 - Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
 - Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
+- Bug in :meth:`DateOffset.rollback` (and subclass methods) with ``normalize=True`` rolling back one offset too long (:issue:`32616`)
+- Bug in :meth:`DatetimeIndex.asof` with a string key giving incorrect results (:issue:`50946`)
 - Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
 - Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
 - Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
@@ -998,6 +1002,7 @@ Datetimelike
 - Bug in comparison between objects with pyarrow date dtype and ``timestamp[pyarrow]`` or ``np.datetime64`` dtype failing to consider these as non-comparable (:issue:`62157`)
 - Bug in constructing arrays with :class:`ArrowDtype` with ``timestamp`` type incorrectly allowing ``Decimal("NaN")`` (:issue:`61773`)
 - Bug in constructing arrays with a timezone-aware :class:`ArrowDtype` from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like :class:`DatetimeTZDtype` (:issue:`61775`)
+- Bug in retaining frequency in :meth:`value_counts` specifically for :meth:`DatetimeIndex` and :meth:`TimedeltaIndex` (:issue:`33830`)
 - Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
 
 Timedelta
@@ -1049,6 +1054,7 @@ Interval
 - Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
 - Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
 - Bug in :meth:`IntervalIndex.get_indexer` and :meth:`IntervalIndex.drop` when one of the sides of the index is non-unique (:issue:`52245`)
+- Construction of :class:`IntervalArray` and :class:`IntervalIndex` from arrays with mismatched signed/unsigned integer dtypes (e.g., ``int64`` and ``uint64``) now raises a :exc:`TypeError` instead of proceeding silently. (:issue:`55715`)
 
 Indexing
 ^^^^^^^^
@@ -1180,10 +1186,13 @@ Reshaping
 - Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
 - Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
 - Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
+- Bug in :meth:`DataFrame.combine_first` with non-unique columns incorrectly raising (:issue:`29135`)
+- Bug in :meth:`DataFrame.combine` with non-unique columns incorrectly raising (:issue:`51340`)
 - Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
 - Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
 - Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
 - Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
+- Bug in :meth:`Series.combine_first` incorrectly replacing ``None`` entries with ``NaN`` (:issue:`58977`)
 - Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
 - Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
 - Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)
 
@@ -692,6 +692,9 @@ cdef class BaseOffset:
             Rolled timestamp if not on offset, otherwise unchanged timestamp.
         """
         dt = Timestamp(dt)
+        if self.normalize and (dt - dt.normalize())._value != 0:
+            # GH#32616
+            dt = dt.normalize()
         if not self.is_on_offset(dt):
             dt = dt - type(self)(1, normalize=self.normalize, **self.kwds)
         return dt
 
@@ -868,8 +868,10 @@ def value_counts_internal(
     dropna: bool = True,
 ) -> Series:
     from pandas import (
+        DatetimeIndex,
         Index,
         Series,
+        TimedeltaIndex,
     )
 
     index_name = getattr(values, "name", None)
@@ -934,6 +936,17 @@ def value_counts_internal(
             # Starting in 3.0, we no longer perform dtype inference on the
             #  Index object we construct here, xref GH#56161
             idx = Index(keys, dtype=keys.dtype, name=index_name)
+
+            if (
+                bins is None
+                and not sort
+                and isinstance(values, (DatetimeIndex, TimedeltaIndex))
+                and idx.equals(values)
+                and values.inferred_freq is not None
+            ):
+                # Preserve freq of original index
+                idx.freq = values.inferred_freq  # type: ignore[attr-defined]
+
             result = Series(counts, index=idx, name=name, copy=False)
 
     if sort:
 
@@ -829,6 +829,14 @@ def __arrow_array__(self, type=None):
         """Convert myself to a pyarrow ChunkedArray."""
         return self._pa_array
 
+    def __array_ufunc__(self, ufunc: np.ufunc, method: str, *inputs, **kwargs):
+        # Need to wrap np.array results GH#62800
+        result = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
+        if type(self) is ArrowExtensionArray:
+            # Exclude ArrowStringArray
+            return type(self)._from_sequence(result)
+        return result
+
     def __array__(
         self, dtype: NpDtype | None = None, copy: bool | None = None
     ) -> np.ndarray:
 
@@ -420,6 +420,18 @@ def _ensure_simple_new_inputs(
 
         dtype = IntervalDtype(left.dtype, closed=closed)
 
+        # Check for mismatched signed/unsigned integer dtypes after casting
+        left_dtype = left.dtype
+        right_dtype = right.dtype
+        if (
+            left_dtype.kind in "iu"
+            and right_dtype.kind in "iu"
+            and left_dtype.kind != right_dtype.kind
+        ):
+            raise TypeError(
+                f"Left and right arrays must have matching signedness. "
+                f"Got {left_dtype} and {right_dtype}."
+            )
         return left, right, dtype
 
     @classmethod
 
@@ -9038,16 +9038,6 @@ def combine(
         0  0 -5.0
         1  0  4.0
 
-        However, if the same element in both dataframes is None, that None
-        is preserved
-
-        >>> df1 = pd.DataFrame({"A": [0, 0], "B": [None, 4]})
-        >>> df2 = pd.DataFrame({"A": [1, 1], "B": [None, 3]})
-        >>> df1.combine(df2, take_smaller, fill_value=-5)
-            A    B
-        0  0 -5.0
-        1  0  3.0
-
         Example that demonstrates the use of `overwrite` and behavior when
         the axis differ between the dataframes.
 
@@ -9106,11 +9096,14 @@ def combine(
 
         # preserve column order
         new_columns = self.columns.union(other_columns, sort=False)
+        this = this.reindex(new_columns, axis=1)
+        other = other.reindex(new_columns, axis=1)
+
         do_fill = fill_value is not None
         result = {}
-        for col in new_columns:
-            series = this[col]
-            other_series = other[col]
+        for i in range(this.shape[1]):
+            series = this.iloc[:, i]
+            other_series = other.iloc[:, i]
 
             this_dtype = series.dtype
             other_dtype = other_series.dtype
@@ -9121,7 +9114,7 @@ def combine(
             # don't overwrite columns unnecessarily
             # DO propagate if this column is not in the intersection
             if not overwrite and other_mask.all():
-                result[col] = this[col].copy()
+                result[i] = series.copy()
                 continue
 
             if do_fill:
@@ -9130,7 +9123,7 @@ def combine(
                 series[this_mask] = fill_value
                 other_series[other_mask] = fill_value
 
-            if col not in self.columns:
+            if new_columns[i] not in self.columns:
                 # If self DataFrame does not have col in other DataFrame,
                 # try to promote series, which is all NaN, as other_dtype.
                 new_dtype = other_dtype
@@ -9155,10 +9148,10 @@ def combine(
                     arr, new_dtype
                 )
 
-            result[col] = arr
+            result[i] = arr
 
-        # convert_objects just in case
-        frame_result = self._constructor(result, index=new_index, columns=new_columns)
+        frame_result = self._constructor(result, index=new_index)
+        frame_result.columns = new_columns
         return frame_result.__finalize__(self, method="combine")
 
     def combine_first(self, other: DataFrame) -> DataFrame:
@@ -9222,9 +9215,14 @@ def combiner(x: Series, y: Series):
             combined = self.combine(other, combiner, overwrite=False)
 
         dtypes = {
+            # Check for isinstance(..., (np.dtype, ExtensionDtype))
+            #  to prevent raising on non-unique columns see GH#29135.
+            #  Note we will just not-cast in these cases.
             col: find_common_type([self.dtypes[col], other.dtypes[col]])
             for col in self.columns.intersection(other.columns)
-            if combined.dtypes[col] != self.dtypes[col]
+            if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))
+            and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))
+            and combined.dtypes[col] != self.dtypes[col]
         }
 
         if dtypes:
@@ -13820,8 +13818,8 @@ def quantile(
         0.1  1    1
         0.5  3  100
 
-        Specifying `numeric_only=False` will also compute the quantile of
-        datetime and timedelta data.
+        Specifying `numeric_only=False` will compute the quantiles for all
+        columns.
 
         >>> df = pd.DataFrame(
         ...     {
 
@@ -4168,7 +4168,7 @@ def reindex(
         limit : int, optional
             Maximum number of consecutive labels in ``target`` to match for
             inexact matches.
-        tolerance : int or float, optional
+        tolerance : int, float, or list-like, optional
             Maximum distance between original and new labels for inexact
             matches. The values of the index at the matching locations must
             satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
@@ -5675,7 +5675,7 @@ def asof(self, label):
                 return self._na_value
         else:
             if isinstance(loc, slice):
-                loc = loc.indices(len(self))[-1]
+                return self[loc][-1]
 
         return self[loc]
 
 
@@ -87,7 +87,6 @@
 )
 from pandas.core.dtypes.dtypes import (
     ExtensionDtype,
-    SparseDtype,
 )
 from pandas.core.dtypes.generic import (
     ABCDataFrame,
@@ -3112,8 +3111,8 @@ def combine(
 
         Combine the Series and `other` using `func` to perform elementwise
         selection for combined Series.
-        `fill_value` is assumed when value is missing at some index
-        from one of the two objects being combined.
+        `fill_value` is assumed when value is not present at some index
+        from one of the two Series being combined.
 
         Parameters
         ----------
@@ -3254,9 +3253,6 @@ def combine_first(self, other) -> Series:
         if self.dtype == other.dtype:
             if self.index.equals(other.index):
                 return self.mask(self.isna(), other)
-            elif self._can_hold_na and not isinstance(self.dtype, SparseDtype):
-                this, other = self.align(other, join="outer")
-                return this.mask(this.isna(), other)
 
         new_index = self.index.union(other.index)
 
@@ -3271,6 +3267,16 @@ def combine_first(self, other) -> Series:
         if this.dtype.kind == "M" and other.dtype.kind != "M":
             # TODO: try to match resos?
             other = to_datetime(other)
+            warnings.warn(
+                # GH#62931
+                "Silently casting non-datetime 'other' to datetime in "
+                "Series.combine_first is deprecated and will be removed "
+                "in a future version. Explicitly cast before calling "
+                "combine_first instead.",
+                Pandas4Warning,
+                stacklevel=find_stack_level(),
+            )
+
         combined = concat([this, other])
         combined = combined.reindex(new_index)
         return combined.__finalize__(self, method="combine_first")
 
@@ -14,6 +14,7 @@
     Series,
     Timedelta,
     TimedeltaIndex,
+    Timestamp,
     array,
 )
 import pandas._testing as tm
@@ -339,3 +340,81 @@ def test_value_counts_object_inference_deprecated():
     exp = dti.value_counts()
     exp.index = exp.index.astype(object)
     tm.assert_series_equal(res, exp)
+
+
+@pytest.mark.parametrize(
+    ("index", "expected_index"),
+    [
+        [
+            pd.date_range("2016-01-01", periods=5, freq="D"),
+            pd.date_range("2016-01-01", periods=5, freq="D"),
+        ],
+        [
+            pd.timedelta_range(Timedelta(0), periods=5, freq="h"),
+            pd.timedelta_range(Timedelta(0), periods=5, freq="h"),
+        ],
+        [
+            DatetimeIndex(
+                [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(1)]
+                + [Timestamp("2016-01-02")]
+                + [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(1, 5)]
+            ),
+            DatetimeIndex(pd.date_range("2016-01-01", periods=5, freq="D")),
+        ],
+        [
+            TimedeltaIndex(
+                [Timedelta(hours=i) for i in range(1)]
+                + [Timedelta(hours=1)]
+                + [Timedelta(hours=i) for i in range(1, 5)],
+            ),
+            TimedeltaIndex(pd.timedelta_range(Timedelta(0), periods=5, freq="h")),
+        ],
+        [
+            DatetimeIndex(
+                [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(2)]
+                + [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(3, 5)],
+            ),
+            DatetimeIndex(
+                [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(2)]
+                + [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(3, 5)],
+            ),
+        ],
+        [
+            TimedeltaIndex(
+                [Timedelta(hours=i) for i in range(2)]
+                + [Timedelta(hours=i) for i in range(3, 5)],
+            ),
+            TimedeltaIndex(
+                [Timedelta(hours=i) for i in range(2)]
+                + [Timedelta(hours=i) for i in range(3, 5)],
+            ),
+        ],
+        [
+            DatetimeIndex(
+                [Timestamp("2016-01-01")]
+                + [pd.NaT]
+                + [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(1, 5)],
+            ),
+            DatetimeIndex(
+                [Timestamp("2016-01-01")]
+                + [pd.NaT]
+                + [Timestamp("2016-01-01") + Timedelta(days=i) for i in range(1, 5)],
+            ),
+        ],
+        [
+            TimedeltaIndex(
+                [Timedelta(hours=0)]
+                + [pd.NaT]
+                + [Timedelta(hours=i) for i in range(1, 5)],
+            ),
+            TimedeltaIndex(
+                [Timedelta(hours=0)]
+                + [pd.NaT]
+                + [Timedelta(hours=i) for i in range(1, 5)],
+            ),
+        ],
+    ],
+)
+def test_value_counts_index_datetimelike(index, expected_index):
+    vc = index.value_counts(sort=False, dropna=False)
+    tm.assert_index_equal(vc.index, expected_index)
@@ -3800,3 +3800,13 @@ def test_cast_pontwise_result_decimal_nan():
 
     pa_type = result.dtype.pyarrow_dtype
     assert pa.types.is_decimal(pa_type)
+
+
+def test_ufunc_retains_missing():
+    # GH#62800
+    ser = pd.Series([0.1, pd.NA], dtype="float64[pyarrow]")
+
+    result = np.sin(ser)
+
+    expected = pd.Series([np.sin(0.1), pd.NA], dtype="float64[pyarrow]")
+    tm.assert_series_equal(result, expected)