Skip to content

Commit db177af

Browse files
committed
Merge branch 'main' into to_datetime-micros
2 parents bb434e5 + 94c7e88 commit db177af

File tree

42 files changed

+460
-241
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+460
-241
lines changed

doc/source/reference/aliases.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Alias Meaning
6565
:py:type:`NaPosition` Argument type for ``na_position`` in :meth:`sort_index` and :meth:`sort_values`
6666
:py:type:`NsmallestNlargestKeep` Argument type for ``keep`` in :meth:`nlargest` and :meth:`nsmallest`
6767
:py:type:`OpenFileErrors` Argument type for ``errors`` in :meth:`to_hdf` and :meth:`to_csv`
68-
:py:type:`Ordered` Return type for :py:attr:`ordered`` in :class:`CategoricalDtype` and :class:`Categorical`
68+
:py:type:`Ordered` Return type for :py:attr:`ordered` in :class:`CategoricalDtype` and :class:`Categorical`
6969
:py:type:`ParquetCompressionOptions` Argument type for ``compression`` in :meth:`DataFrame.to_parquet`
7070
:py:type:`QuantileInterpolation` Argument type for ``interpolation`` in :meth:`quantile`
7171
:py:type:`ReadBuffer` Additional argument type corresponding to buffers for various file reading methods
@@ -89,7 +89,7 @@ Alias Meaning
8989
:py:type:`ToTimestampHow` Argument type for ``how`` in :meth:`to_timestamp` and ``convention`` in :meth:`resample`
9090
:py:type:`UpdateJoin` Argument type for ``join`` in :meth:`DataFrame.update`
9191
:py:type:`UsecolsArgType` Argument type for ``usecols`` in :meth:`pandas.read_clipboard`, :meth:`pandas.read_csv` and :meth:`pandas.read_excel`
92-
:py:type:`WindowingRankType` Argument type for ``method`` in :meth:`rank`` in rolling and expanding window operations
92+
:py:type:`WindowingRankType` Argument type for ``method`` in :meth:`rank` in rolling and expanding window operations
9393
:py:type:`WriteBuffer` Additional argument type corresponding to buffers for various file output methods
9494
:py:type:`WriteExcelBuffer` Additional argument type corresponding to buffers for :meth:`to_excel`
9595
:py:type:`XMLParsers` Argument type for ``parser`` in :meth:`DataFrame.to_xml` and :meth:`pandas.read_xml`

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ Other enhancements
201201
- :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
202202
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
203203
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
204+
- :func:`to_numeric` on big integers converts to ``object`` datatype with python integers when not coercing. (:issue:`51295`)
204205
- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
205206
- :meth:`DataFrame.apply` supports using third-party execution engines like the Bodo.ai JIT compiler (:issue:`60668`)
206207
- :meth:`DataFrame.iloc` and :meth:`Series.iloc` now support boolean masks in ``__getitem__`` for more consistent indexing behavior (:issue:`60994`)
@@ -739,6 +740,7 @@ Other Deprecations
739740
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
740741
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
741742
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
743+
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
742744

743745
.. ---------------------------------------------------------------------------
744746
.. _whatsnew_300.prior_deprecations:
@@ -939,6 +941,7 @@ Performance improvements
939941
- Performance improvement in :meth:`RangeIndex.reindex` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57647`, :issue:`57752`)
940942
- Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`, :issue:`57752`)
941943
- Performance improvement in :func:`merge` if hash-join can be used (:issue:`57970`)
944+
- Performance improvement in :func:`merge` when join keys have different dtypes and need to be upcast (:issue:`62902`)
942945
- Performance improvement in :meth:`CategoricalDtype.update_dtype` when ``dtype`` is a :class:`CategoricalDtype` with non ``None`` categories and ordered (:issue:`59647`)
943946
- Performance improvement in :meth:`DataFrame.__getitem__` when ``key`` is a :class:`DataFrame` with many columns (:issue:`61010`)
944947
- Performance improvement in :meth:`DataFrame.astype` when converting to extension floating dtypes, e.g. "Float64" (:issue:`60066`)
@@ -982,6 +985,7 @@ Datetimelike
982985
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
983986
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
984987
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
988+
- Bug in :meth:`DatetimeIndex.where` and :meth:`TimedeltaIndex.where` failing to set ``freq=None`` in some cases (:issue:`24555`)
985989
- Bug in :meth:`Index.union` with a ``pyarrow`` timestamp dtype incorrectly returning ``object`` dtype (:issue:`58421`)
986990
- Bug in :meth:`Series.dt.microsecond` producing incorrect results for pyarrow backed :class:`Series`. (:issue:`59154`)
987991
- Bug in :meth:`Timestamp.normalize` and :meth:`DatetimeArray.normalize` returning incorrect results instead of raising on integer overflow for very small (distant past) values (:issue:`60583`)
@@ -998,7 +1002,6 @@ Datetimelike
9981002
- Bug in constructing arrays with a timezone-aware :class:`ArrowDtype` from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like :class:`DatetimeTZDtype` (:issue:`61775`)
9991003
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
10001004

1001-
10021005
Timedelta
10031006
^^^^^^^^^
10041007
- Accuracy improvement in :meth:`Timedelta.to_pytimedelta` to round microseconds consistently for large nanosecond based Timedelta (:issue:`57841`)
@@ -1114,7 +1117,7 @@ I/O
11141117
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
11151118
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
11161119
- Bug in :meth:`read_csv` where the order of the ``na_values`` makes an inconsistency when ``na_values`` is a list non-string values. (:issue:`59303`)
1117-
- Bug in :meth:`read_csv` with ``engine="c"`` reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
1120+
- Bug in :meth:`read_csv` with ``c`` and ``python`` engines reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
11181121
- Bug in :meth:`read_csv` with ``engine="c"`` reading large float numbers with preceding integers as strings. Now reads them as floats. (:issue:`51295`)
11191122
- Bug in :meth:`read_csv` with ``engine="pyarrow"`` and ``dtype="Int64"`` losing precision (:issue:`56136`)
11201123
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)
@@ -1176,6 +1179,7 @@ Groupby/resample/rolling
11761179

11771180
Reshaping
11781181
^^^^^^^^^
1182+
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11791183
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11801184
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
11811185
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)

pandas/_libs/arrays.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,10 @@ cdef class NDArrayBacked:
100100
if len(state) == 1 and isinstance(state[0], dict):
101101
self.__setstate__(state[0])
102102
return
103+
elif len(state) == 2:
104+
# GH#62820: Handle missing attrs dict during auto-unpickling
105+
self.__setstate__((*state, {}))
106+
return
103107
raise NotImplementedError(state) # pragma: no cover
104108

105109
data, dtype = state[:2]

pandas/_libs/lib.pyx

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1386,6 +1386,7 @@ cdef class Seen:
13861386
bint nan_ # seen_np.nan
13871387
bint uint_ # seen_uint (unsigned integer)
13881388
bint sint_ # seen_sint (signed integer)
1389+
bint overflow_ # seen_overflow
13891390
bint float_ # seen_float
13901391
bint object_ # seen_object
13911392
bint complex_ # seen_complex
@@ -1414,6 +1415,7 @@ cdef class Seen:
14141415
self.nan_ = False
14151416
self.uint_ = False
14161417
self.sint_ = False
1418+
self.overflow_ = False
14171419
self.float_ = False
14181420
self.object_ = False
14191421
self.complex_ = False
@@ -2379,6 +2381,9 @@ def maybe_convert_numeric(
23792381
ndarray[uint64_t, ndim=1] uints = cnp.PyArray_EMPTY(
23802382
1, values.shape, cnp.NPY_UINT64, 0
23812383
)
2384+
ndarray[object, ndim=1] pyints = cnp.PyArray_EMPTY(
2385+
1, values.shape, cnp.NPY_OBJECT, 0
2386+
)
23822387
ndarray[uint8_t, ndim=1] bools = cnp.PyArray_EMPTY(
23832388
1, values.shape, cnp.NPY_UINT8, 0
23842389
)
@@ -2421,18 +2426,24 @@ def maybe_convert_numeric(
24212426

24222427
val = int(val)
24232428
seen.saw_int(val)
2429+
pyints[i] = val
24242430

24252431
if val >= 0:
24262432
if val <= oUINT64_MAX:
24272433
uints[i] = val
2428-
else:
2434+
elif seen.coerce_numeric:
24292435
seen.float_ = True
2436+
else:
2437+
seen.overflow_ = True
24302438

24312439
if oINT64_MIN <= val <= oINT64_MAX:
24322440
ints[i] = val
24332441

24342442
if val < oINT64_MIN or (seen.sint_ and seen.uint_):
2435-
seen.float_ = True
2443+
if seen.coerce_numeric:
2444+
seen.float_ = True
2445+
else:
2446+
seen.overflow_ = True
24362447

24372448
elif util.is_bool_object(val):
24382449
floats[i] = uints[i] = ints[i] = bools[i] = val
@@ -2476,6 +2487,7 @@ def maybe_convert_numeric(
24762487

24772488
if maybe_int:
24782489
as_int = int(val)
2490+
pyints[i] = as_int
24792491

24802492
if as_int in na_values:
24812493
mask[i] = 1
@@ -2490,7 +2502,7 @@ def maybe_convert_numeric(
24902502
if seen.coerce_numeric:
24912503
seen.float_ = True
24922504
else:
2493-
raise ValueError("Integer out of range.")
2505+
seen.overflow_ = True
24942506
else:
24952507
if as_int >= 0:
24962508
uints[i] = as_int
@@ -2529,11 +2541,15 @@ def maybe_convert_numeric(
25292541
return (floats, None)
25302542
elif seen.int_:
25312543
if seen.null_ and convert_to_masked_nullable:
2532-
if seen.uint_:
2544+
if seen.overflow_:
2545+
return (pyints, mask.view(np.bool_))
2546+
elif seen.uint_:
25332547
return (uints, mask.view(np.bool_))
25342548
else:
25352549
return (ints, mask.view(np.bool_))
2536-
if seen.uint_:
2550+
if seen.overflow_:
2551+
return (pyints, None)
2552+
elif seen.uint_:
25372553
return (uints, None)
25382554
else:
25392555
return (ints, None)

pandas/_libs/tslibs/timedeltas.pyx

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2026,6 +2026,19 @@ class Timedelta(_Timedelta):
20262026
"milliseconds, microseconds, nanoseconds]"
20272027
)
20282028

2029+
if (
2030+
unit is not None
2031+
and not (is_float_object(value) or is_integer_object(value))
2032+
):
2033+
# GH#53198
2034+
warnings.warn(
2035+
"The 'unit' keyword is only used when the Timedelta input is "
2036+
f"an integer or float, not {type(value).__name__}. "
2037+
"To specify the storage unit of the output use `td.as_unit(unit)`",
2038+
UserWarning,
2039+
stacklevel=find_stack_level(),
2040+
)
2041+
20292042
if value is _no_input:
20302043
if not len(kwargs):
20312044
raise ValueError("cannot construct a Timedelta without a "

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ from pandas._libs.tslibs.dtypes cimport (
6767
)
6868
from pandas._libs.tslibs.util cimport (
6969
is_array,
70+
is_float_object,
7071
is_integer_object,
7172
)
7273

@@ -2654,6 +2655,19 @@ class Timestamp(_Timestamp):
26542655
if hasattr(ts_input, "fold"):
26552656
ts_input = ts_input.replace(fold=fold)
26562657
2658+
if (
2659+
unit is not None
2660+
and not (is_float_object(ts_input) or is_integer_object(ts_input))
2661+
):
2662+
# GH#53198
2663+
warnings.warn(
2664+
"The 'unit' keyword is only used when the Timestamp input is "
2665+
f"an integer or float, not {type(ts_input).__name__}. "
2666+
"To specify the storage unit of the output use `ts.as_unit(unit)`",
2667+
UserWarning,
2668+
stacklevel=find_stack_level(),
2669+
)
2670+
26572671
# GH 30543 if pd.Timestamp already passed, return it
26582672
# check that only ts_input is passed
26592673
# checking verbosely, because cython doesn't optimize

pandas/core/arrays/datetimelike.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,9 @@ def __getitem__(self, key: PositionalIndexer2D) -> Self | DTScalarOrNaT:
393393
else:
394394
# At this point we know the result is an array.
395395
result = cast(Self, result)
396-
result._freq = self._get_getitem_freq(key)
396+
# error: Incompatible types in assignment (expression has type
397+
# "BaseOffset | None", variable has type "None")
398+
result._freq = self._get_getitem_freq(key) # type: ignore[assignment]
397399
return result
398400

399401
def _get_getitem_freq(self, key) -> BaseOffset | None:
@@ -527,6 +529,10 @@ def view(self, dtype: Dtype | None = None) -> ArrayLike:
527529
# are present in this file.
528530
return super().view(dtype)
529531

532+
def _putmask(self, mask: npt.NDArray[np.bool_], value) -> None:
533+
super()._putmask(mask, value)
534+
self._freq = None # GH#24555
535+
530536
# ------------------------------------------------------------------
531537
# Validation Methods
532538
# TODO: try to de-duplicate these, ensure identical behavior
@@ -2042,7 +2048,7 @@ def _maybe_pin_freq(self, freq, validate_kwds: dict) -> None:
20422048
if self._freq is None:
20432049
# Set _freq directly to bypass duplicative _validate_frequency
20442050
# check.
2045-
self._freq = to_offset(self.inferred_freq)
2051+
self._freq = to_offset(self.inferred_freq) # type: ignore[assignment]
20462052
elif freq is lib.no_default:
20472053
# user did not specify anything, keep inferred freq if the original
20482054
# data had one, otherwise do nothing
@@ -2442,7 +2448,7 @@ def take(
24422448

24432449
if isinstance(maybe_slice, slice):
24442450
freq = self._get_getitem_freq(maybe_slice)
2445-
result._freq = freq
2451+
result._freq = freq # type: ignore[assignment]
24462452

24472453
return result
24482454

pandas/core/computation/pytables.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717

1818
import numpy as np
1919

20+
from pandas._libs import lib
2021
from pandas._libs.tslibs import (
2122
Timedelta,
2223
Timestamp,
@@ -227,8 +228,10 @@ def stringify(value):
227228
elif kind in ("timedelta64", "timedelta"):
228229
if isinstance(conv_val, str):
229230
conv_val = Timedelta(conv_val)
230-
else:
231+
elif lib.is_integer(conv_val) or lib.is_float(conv_val):
231232
conv_val = Timedelta(conv_val, unit="s")
233+
else:
234+
conv_val = Timedelta(conv_val)
232235
conv_val = conv_val.as_unit("ns")._value
233236
return TermValue(int(conv_val), conv_val, kind)
234237
elif meta == "category":

pandas/core/dtypes/concat.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,10 @@ def _get_result_dtype(
161161
# coerce to object
162162
target_dtype = np.dtype(object)
163163
kinds = {"o"}
164+
elif "b" in kinds and len(kinds) > 1:
165+
# GH#21108, GH#45101
166+
target_dtype = np.dtype(object)
167+
kinds = {"o"}
164168
else:
165169
# error: Argument 1 to "np_find_common_type" has incompatible type
166170
# "*Set[Union[ExtensionDtype, Any]]"; expected "dtype[Any]"

pandas/core/frame.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11554,6 +11554,15 @@ def _dict_round(df: DataFrame, decimals) -> Iterator[Series]:
1155411554
def _series_round(ser: Series, decimals: int) -> Series:
1155511555
if is_integer_dtype(ser.dtype) or is_float_dtype(ser.dtype):
1155611556
return ser.round(decimals)
11557+
elif isinstance(ser._values, (DatetimeArray, TimedeltaArray, PeriodArray)):
11558+
# GH#57781
11559+
# TODO: also the ArrowDtype analogues?
11560+
warnings.warn(
11561+
"obj.round has no effect with datetime, timedelta, "
11562+
"or period dtypes. Use obj.dt.round(...) instead.",
11563+
UserWarning,
11564+
stacklevel=find_stack_level(),
11565+
)
1155711566
return ser
1155811567

1155911568
nv.validate_round(args, kwargs)

0 commit comments

Comments
 (0)