Skip to content

Commit c01c80b

Browse files
committed
Merge branch 'main' into enh-list-arith
2 parents 5371ec9 + 94c7e88 commit c01c80b

File tree

35 files changed

+411
-231
lines changed

35 files changed

+411
-231
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ Other enhancements
201201
- :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
202202
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
203203
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
204+
- :func:`to_numeric` on big integers converts to ``object`` datatype with python integers when not coercing. (:issue:`51295`)
204205
- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
205206
- :meth:`DataFrame.apply` supports using third-party execution engines like the Bodo.ai JIT compiler (:issue:`60668`)
206207
- :meth:`DataFrame.iloc` and :meth:`Series.iloc` now support boolean masks in ``__getitem__`` for more consistent indexing behavior (:issue:`60994`)
@@ -738,6 +739,7 @@ Other Deprecations
738739
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
739740
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
740741
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
742+
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
741743

742744
.. ---------------------------------------------------------------------------
743745
.. _whatsnew_300.prior_deprecations:
@@ -938,6 +940,7 @@ Performance improvements
938940
- Performance improvement in :meth:`RangeIndex.reindex` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57647`, :issue:`57752`)
939941
- Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`, :issue:`57752`)
940942
- Performance improvement in :func:`merge` if hash-join can be used (:issue:`57970`)
943+
- Performance improvement in :func:`merge` when join keys have different dtypes and need to be upcast (:issue:`62902`)
941944
- Performance improvement in :meth:`CategoricalDtype.update_dtype` when ``dtype`` is a :class:`CategoricalDtype` with non ``None`` categories and ordered (:issue:`59647`)
942945
- Performance improvement in :meth:`DataFrame.__getitem__` when ``key`` is a :class:`DataFrame` with many columns (:issue:`61010`)
943946
- Performance improvement in :meth:`DataFrame.astype` when converting to extension floating dtypes, e.g. "Float64" (:issue:`60066`)
@@ -1113,7 +1116,7 @@ I/O
11131116
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
11141117
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
11151118
- Bug in :meth:`read_csv` where the order of the ``na_values`` makes an inconsistency when ``na_values`` is a list non-string values. (:issue:`59303`)
1116-
- Bug in :meth:`read_csv` with ``engine="c"`` reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
1119+
- Bug in :meth:`read_csv` with ``c`` and ``python`` engines reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
11171120
- Bug in :meth:`read_csv` with ``engine="c"`` reading large float numbers with preceding integers as strings. Now reads them as floats. (:issue:`51295`)
11181121
- Bug in :meth:`read_csv` with ``engine="pyarrow"`` and ``dtype="Int64"`` losing precision (:issue:`56136`)
11191122
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)
@@ -1175,6 +1178,7 @@ Groupby/resample/rolling
11751178

11761179
Reshaping
11771180
^^^^^^^^^
1181+
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11781182
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11791183
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
11801184
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)

pandas/_libs/arrays.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,10 @@ cdef class NDArrayBacked:
100100
if len(state) == 1 and isinstance(state[0], dict):
101101
self.__setstate__(state[0])
102102
return
103+
elif len(state) == 2:
104+
# GH#62820: Handle missing attrs dict during auto-unpickling
105+
self.__setstate__((*state, {}))
106+
return
103107
raise NotImplementedError(state) # pragma: no cover
104108

105109
data, dtype = state[:2]

pandas/_libs/lib.pyx

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1386,6 +1386,7 @@ cdef class Seen:
13861386
bint nan_ # seen_np.nan
13871387
bint uint_ # seen_uint (unsigned integer)
13881388
bint sint_ # seen_sint (signed integer)
1389+
bint overflow_ # seen_overflow
13891390
bint float_ # seen_float
13901391
bint object_ # seen_object
13911392
bint complex_ # seen_complex
@@ -1414,6 +1415,7 @@ cdef class Seen:
14141415
self.nan_ = False
14151416
self.uint_ = False
14161417
self.sint_ = False
1418+
self.overflow_ = False
14171419
self.float_ = False
14181420
self.object_ = False
14191421
self.complex_ = False
@@ -2379,6 +2381,9 @@ def maybe_convert_numeric(
23792381
ndarray[uint64_t, ndim=1] uints = cnp.PyArray_EMPTY(
23802382
1, values.shape, cnp.NPY_UINT64, 0
23812383
)
2384+
ndarray[object, ndim=1] pyints = cnp.PyArray_EMPTY(
2385+
1, values.shape, cnp.NPY_OBJECT, 0
2386+
)
23822387
ndarray[uint8_t, ndim=1] bools = cnp.PyArray_EMPTY(
23832388
1, values.shape, cnp.NPY_UINT8, 0
23842389
)
@@ -2421,18 +2426,24 @@ def maybe_convert_numeric(
24212426

24222427
val = int(val)
24232428
seen.saw_int(val)
2429+
pyints[i] = val
24242430

24252431
if val >= 0:
24262432
if val <= oUINT64_MAX:
24272433
uints[i] = val
2428-
else:
2434+
elif seen.coerce_numeric:
24292435
seen.float_ = True
2436+
else:
2437+
seen.overflow_ = True
24302438

24312439
if oINT64_MIN <= val <= oINT64_MAX:
24322440
ints[i] = val
24332441

24342442
if val < oINT64_MIN or (seen.sint_ and seen.uint_):
2435-
seen.float_ = True
2443+
if seen.coerce_numeric:
2444+
seen.float_ = True
2445+
else:
2446+
seen.overflow_ = True
24362447

24372448
elif util.is_bool_object(val):
24382449
floats[i] = uints[i] = ints[i] = bools[i] = val
@@ -2476,6 +2487,7 @@ def maybe_convert_numeric(
24762487

24772488
if maybe_int:
24782489
as_int = int(val)
2490+
pyints[i] = as_int
24792491

24802492
if as_int in na_values:
24812493
mask[i] = 1
@@ -2490,7 +2502,7 @@ def maybe_convert_numeric(
24902502
if seen.coerce_numeric:
24912503
seen.float_ = True
24922504
else:
2493-
raise ValueError("Integer out of range.")
2505+
seen.overflow_ = True
24942506
else:
24952507
if as_int >= 0:
24962508
uints[i] = as_int
@@ -2529,11 +2541,15 @@ def maybe_convert_numeric(
25292541
return (floats, None)
25302542
elif seen.int_:
25312543
if seen.null_ and convert_to_masked_nullable:
2532-
if seen.uint_:
2544+
if seen.overflow_:
2545+
return (pyints, mask.view(np.bool_))
2546+
elif seen.uint_:
25332547
return (uints, mask.view(np.bool_))
25342548
else:
25352549
return (ints, mask.view(np.bool_))
2536-
if seen.uint_:
2550+
if seen.overflow_:
2551+
return (pyints, None)
2552+
elif seen.uint_:
25372553
return (uints, None)
25382554
else:
25392555
return (ints, None)

pandas/_libs/tslibs/timedeltas.pyx

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2026,6 +2026,19 @@ class Timedelta(_Timedelta):
20262026
"milliseconds, microseconds, nanoseconds]"
20272027
)
20282028

2029+
if (
2030+
unit is not None
2031+
and not (is_float_object(value) or is_integer_object(value))
2032+
):
2033+
# GH#53198
2034+
warnings.warn(
2035+
"The 'unit' keyword is only used when the Timedelta input is "
2036+
f"an integer or float, not {type(value).__name__}. "
2037+
"To specify the storage unit of the output use `td.as_unit(unit)`",
2038+
UserWarning,
2039+
stacklevel=find_stack_level(),
2040+
)
2041+
20292042
if value is _no_input:
20302043
if not len(kwargs):
20312044
raise ValueError("cannot construct a Timedelta without a "

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ from pandas._libs.tslibs.dtypes cimport (
6767
)
6868
from pandas._libs.tslibs.util cimport (
6969
is_array,
70+
is_float_object,
7071
is_integer_object,
7172
)
7273

@@ -2654,6 +2655,19 @@ class Timestamp(_Timestamp):
26542655
if hasattr(ts_input, "fold"):
26552656
ts_input = ts_input.replace(fold=fold)
26562657
2658+
if (
2659+
unit is not None
2660+
and not (is_float_object(ts_input) or is_integer_object(ts_input))
2661+
):
2662+
# GH#53198
2663+
warnings.warn(
2664+
"The 'unit' keyword is only used when the Timestamp input is "
2665+
f"an integer or float, not {type(ts_input).__name__}. "
2666+
"To specify the storage unit of the output use `ts.as_unit(unit)`",
2667+
UserWarning,
2668+
stacklevel=find_stack_level(),
2669+
)
2670+
26572671
# GH 30543 if pd.Timestamp already passed, return it
26582672
# check that only ts_input is passed
26592673
# checking verbosely, because cython doesn't optimize

pandas/core/computation/pytables.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717

1818
import numpy as np
1919

20+
from pandas._libs import lib
2021
from pandas._libs.tslibs import (
2122
Timedelta,
2223
Timestamp,
@@ -227,8 +228,10 @@ def stringify(value):
227228
elif kind in ("timedelta64", "timedelta"):
228229
if isinstance(conv_val, str):
229230
conv_val = Timedelta(conv_val)
230-
else:
231+
elif lib.is_integer(conv_val) or lib.is_float(conv_val):
231232
conv_val = Timedelta(conv_val, unit="s")
233+
else:
234+
conv_val = Timedelta(conv_val)
232235
conv_val = conv_val.as_unit("ns")._value
233236
return TermValue(int(conv_val), conv_val, kind)
234237
elif meta == "category":

pandas/core/dtypes/concat.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,10 @@ def _get_result_dtype(
161161
# coerce to object
162162
target_dtype = np.dtype(object)
163163
kinds = {"o"}
164+
elif "b" in kinds and len(kinds) > 1:
165+
# GH#21108, GH#45101
166+
target_dtype = np.dtype(object)
167+
kinds = {"o"}
164168
else:
165169
# error: Argument 1 to "np_find_common_type" has incompatible type
166170
# "*Set[Union[ExtensionDtype, Any]]"; expected "dtype[Any]"

pandas/core/generic.py

Lines changed: 86 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8156,7 +8156,6 @@ def asof(self, where, subset=None):
81568156
# ----------------------------------------------------------------------
81578157
# Action Methods
81588158

8159-
@doc(klass=_shared_doc_kwargs["klass"])
81608159
def isna(self) -> Self:
81618160
"""
81628161
Detect missing values.
@@ -8169,15 +8168,18 @@ def isna(self) -> Self:
81698168
81708169
Returns
81718170
-------
8172-
{klass}
8173-
Mask of bool values for each element in {klass} that
8174-
indicates whether an element is an NA value.
8171+
Series/DataFrame
8172+
Mask of bool values for each element in Series/DataFrame
8173+
that indicates whether an element is an NA value.
81758174
81768175
See Also
81778176
--------
8178-
{klass}.isnull : Alias of isna.
8179-
{klass}.notna : Boolean inverse of isna.
8180-
{klass}.dropna : Omit axes labels with missing values.
8177+
Series.isnull : Alias of isna.
8178+
DataFrame.isnull : Alias of isna.
8179+
Series.notna : Boolean inverse of isna.
8180+
DataFrame.notna : Boolean inverse of isna.
8181+
Series.dropna : Omit axes labels with missing values.
8182+
DataFrame.dropna : Omit axes labels with missing values.
81818183
isna : Top-level isna.
81828184
81838185
Examples
@@ -8225,11 +8227,77 @@ def isna(self) -> Self:
82258227
"""
82268228
return isna(self).__finalize__(self, method="isna")
82278229

8228-
@doc(isna, klass=_shared_doc_kwargs["klass"])
82298230
def isnull(self) -> Self:
8231+
"""
8232+
Detect missing values.
8233+
8234+
Return a boolean same-sized object indicating if the values are NA.
8235+
NA values, such as None or :attr:`numpy.NaN`, gets mapped to True
8236+
values.
8237+
Everything else gets mapped to False values. Characters such as empty
8238+
strings ``''`` or :attr:`numpy.inf` are not considered NA values.
8239+
8240+
Returns
8241+
-------
8242+
Series/DataFrame
8243+
Mask of bool values for each element in Series/DataFrame
8244+
that indicates whether an element is an NA value.
8245+
8246+
See Also
8247+
--------
8248+
Series.isna : Alias of isnull.
8249+
DataFrame.isna : Alias of isnull.
8250+
Series.notna : Boolean inverse of isnull.
8251+
DataFrame.notna : Boolean inverse of isnull.
8252+
Series.dropna : Omit axes labels with missing values.
8253+
DataFrame.dropna : Omit axes labels with missing values.
8254+
isna : Top-level isna.
8255+
8256+
Examples
8257+
--------
8258+
Show which entries in a DataFrame are NA.
8259+
8260+
>>> df = pd.DataFrame(
8261+
... dict(
8262+
... age=[5, 6, np.nan],
8263+
... born=[
8264+
... pd.NaT,
8265+
... pd.Timestamp("1939-05-27"),
8266+
... pd.Timestamp("1940-04-25"),
8267+
... ],
8268+
... name=["Alfred", "Batman", ""],
8269+
... toy=[None, "Batmobile", "Joker"],
8270+
... )
8271+
... )
8272+
>>> df
8273+
age born name toy
8274+
0 5.0 NaT Alfred NaN
8275+
1 6.0 1939-05-27 Batman Batmobile
8276+
2 NaN 1940-04-25 Joker
8277+
8278+
>>> df.isna()
8279+
age born name toy
8280+
0 False True False True
8281+
1 False False False False
8282+
2 True False False False
8283+
8284+
Show which entries in a Series are NA.
8285+
8286+
>>> ser = pd.Series([5, 6, np.nan])
8287+
>>> ser
8288+
0 5.0
8289+
1 6.0
8290+
2 NaN
8291+
dtype: float64
8292+
8293+
>>> ser.isna()
8294+
0 False
8295+
1 False
8296+
2 True
8297+
dtype: bool
8298+
"""
82308299
return isna(self).__finalize__(self, method="isnull")
82318300

8232-
@doc(klass=_shared_doc_kwargs["klass"])
82338301
def notna(self) -> Self:
82348302
"""
82358303
Detect existing (non-missing) values.
@@ -8242,15 +8310,18 @@ def notna(self) -> Self:
82428310
82438311
Returns
82448312
-------
8245-
{klass}
8246-
Mask of bool values for each element in {klass} that
8247-
indicates whether an element is not an NA value.
8313+
Series/DataFrame
8314+
Mask of bool values for each element in Series/DataFrame
8315+
that indicates whether an element is not an NA value.
82488316
82498317
See Also
82508318
--------
8251-
{klass}.notnull : Alias of notna.
8252-
{klass}.isna : Boolean inverse of notna.
8253-
{klass}.dropna : Omit axes labels with missing values.
8319+
Series.notnull : Alias of notna.
8320+
DataFrame.notnull : Alias of notna.
8321+
Series.isna : Boolean inverse of notna.
8322+
DataFrame.isna : Boolean inverse of notna.
8323+
Series.dropna : Omit axes labels with missing values.
8324+
DataFrame.dropna : Omit axes labels with missing values.
82548325
notna : Top-level notna.
82558326
82568327
Examples

0 commit comments

Comments
 (0)