Skip to content

Commit 2eb7a52

Browse files
committed
Merge branch 'main' into api-timedelta-constructor
2 parents c783e57 + 787ad72 commit 2eb7a52

File tree

11 files changed

+103
-41
lines changed

11 files changed

+103
-41
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 46 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -552,29 +552,55 @@ small behavior differences as collateral:
552552
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes
553553
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
554554

555-
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
555+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
556+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
557+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
558+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
559+
to :class:`NA` in all cases.
556560

557-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
561+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
562+
and will be treated the same as :class:`NA`. The only change users will see is
563+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
564+
entries produce :class:`NA` entries instead.
558565

559566
*Old behavior:*
560567

561568
.. code-block:: ipython
562569
563-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
570+
# NaN in input gets converted to NA
571+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
572+
In [2]: ser
573+
Out[2]:
574+
0 0.0
575+
1 <NA>
576+
dtype: Float64
577+
# NaN produced by arithmetic (0/0) remained NaN
564578
In [3]: ser / 0
565579
Out[3]:
566580
0 NaN
567581
1 <NA>
568582
dtype: Float64
583+
# the NaN value is not considered as missing
584+
In [4]: (ser / 0).isna()
585+
Out[4]:
586+
0 False
587+
1 True
588+
dtype: bool
569589
570590
*New behavior:*
571591

572592
.. ipython:: python
573593
574-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
594+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
595+
ser
575596
ser / 0
597+
(ser / 0).isna()
576598
577-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
599+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
600+
values, and an option to control this behaviour is added in 3.0 through
601+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
602+
considered distinct and specifically as a floating-point value. As a consequence,
603+
it cannot be used with integer dtypes.
578604

579605
*Old behavior:*
580606

@@ -588,13 +614,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
588614

589615
.. ipython:: python
590616
591-
pd.set_option("mode.nan_is_na", False)
592-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
593-
ser[1]
617+
with pd.option_context("future.distinguish_nan_and_na", True):
618+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
619+
print(ser[1])
620+
621+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
622+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
623+
integer dtype.
594624

595-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
625+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
626+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
627+
:class:`NA` entries are present, where before they would coerce to
628+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
629+
to :meth:`Series.to_numpy`.
596630

597-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
631+
Note that the option is experimental and subject to change in future releases.
598632

599633
The ``__module__`` attribute now points to public modules
600634
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1192,6 +1226,7 @@ MultiIndex
11921226
I/O
11931227
^^^
11941228
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1229+
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``timedelta64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`63239`)
11951230
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
11961231
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
11971232
- Bug in :func:`pandas.json_normalize` inconsistently handling non-dict items in ``data`` when ``max_level`` was set. The function will now raise a ``TypeError`` if ``data`` is a list containing non-dict items (:issue:`62829`)
@@ -1249,6 +1284,7 @@ Plotting
12491284
- Bug in :meth:`Series.plot` preventing a line and bar from being aligned on the same plot (:issue:`61161`)
12501285
- Bug in :meth:`Series.plot` preventing a line and scatter plot from being aligned (:issue:`61005`)
12511286
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
1287+
- Bug in plotting with a :class:`TimedeltaIndex` with non-nanosecond resolution displaying incorrect labels (:issue:`63237`)
12521288

12531289
Groupby/resample/rolling
12541290
^^^^^^^^^^^^^^^^^^^^^^^^

pandas/_config/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,5 @@ def using_string_dtype() -> bool:
3636

3737

3838
def is_nan_na() -> bool:
39-
_mode_options = _global_config["mode"]
40-
return _mode_options["nan_is_na"]
39+
_mode_options = _global_config["future"]
40+
return not _mode_options["distinguish_nan_and_na"]

pandas/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2127,5 +2127,5 @@ def monkeysession():
21272127
@pytest.fixture(params=[True, False])
21282128
def using_nan_is_na(request):
21292129
opt = request.param
2130-
with pd.option_context("mode.nan_is_na", opt):
2130+
with pd.option_context("future.distinguish_nan_and_na", not opt):
21312131
yield opt

pandas/core/computation/pytables.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
Any,
1414
ClassVar,
1515
Self,
16+
cast,
1617
)
1718

1819
import numpy as np
@@ -44,7 +45,10 @@
4445
)
4546

4647
if TYPE_CHECKING:
47-
from pandas._typing import npt
48+
from pandas._typing import (
49+
TimeUnit,
50+
npt,
51+
)
4852

4953

5054
class PyTablesScope(_scope.Scope):
@@ -225,15 +229,19 @@ def stringify(value):
225229
if conv_val.tz is not None:
226230
conv_val = conv_val.tz_convert("UTC")
227231
return TermValue(conv_val, conv_val._value, kind)
228-
elif kind in ("timedelta64", "timedelta"):
232+
elif kind.startswith("timedelta"):
233+
unit = "ns"
234+
if "[" in kind:
235+
unit = cast("TimeUnit", kind.split("[")[-1][:-1])
229236
if isinstance(conv_val, str):
230237
conv_val = Timedelta(conv_val)
231238
elif lib.is_integer(conv_val) or lib.is_float(conv_val):
232239
conv_val = Timedelta(conv_val, unit="s")
233240
else:
234241
conv_val = Timedelta(conv_val)
235-
conv_val = conv_val.as_unit("ns")._value
242+
conv_val = conv_val.as_unit(unit)._value
236243
return TermValue(int(conv_val), conv_val, kind)
244+
237245
elif meta == "category":
238246
metadata = extract_array(self.metadata, extract_numpy=True)
239247
result: npt.NDArray[np.intp] | np.intp | int

pandas/core/config_init.py

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -428,15 +428,6 @@ def is_terminal() -> bool:
428428
validator=is_one_of_factory([True, False, "warn"]),
429429
)
430430

431-
cf.register_option(
432-
"nan_is_na",
433-
os.environ.get("PANDAS_NAN_IS_NA", "1") == "1",
434-
"Whether to treat NaN entries as interchangeable with pd.NA in "
435-
"numpy-nullable and pyarrow float dtypes. See discussion in "
436-
"https://github.com/pandas-dev/pandas/issues/32265",
437-
validator=is_one_of_factory([True, False]),
438-
)
439-
440431

441432
# user warnings
442433
chained_assignment = """
@@ -899,6 +890,18 @@ def register_converter_cb(key: str) -> None:
899890
validator=is_one_of_factory([True, False]),
900891
)
901892

893+
cf.register_option(
894+
"distinguish_nan_and_na",
895+
os.environ.get("PANDAS_FUTURE_DISTINGUISH_NAN_AND_NA", "0") == "1",
896+
"Whether to treat NaN entries as distinct from pd.NA in "
897+
"numpy-nullable and pyarrow float dtypes. By default treats both "
898+
"interchangeable as missing values (NaN will be coerced to NA). "
899+
"See discussion in "
900+
"https://github.com/pandas-dev/pandas/issues/32265",
901+
validator=is_one_of_factory([True, False]),
902+
)
903+
904+
902905
# GH#59502
903906
cf.deprecate_option("future.no_silent_downcasting", Pandas4Warning)
904907
cf.deprecate_option(

pandas/io/json/_json.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -997,7 +997,7 @@ def _read_ujson(self) -> DataFrame | Series:
997997
else:
998998
obj = self._get_object_parser(self.data)
999999
if self.dtype_backend is not lib.no_default:
1000-
with option_context("mode.nan_is_na", True):
1000+
with option_context("future.distinguish_nan_and_na", False):
10011001
return obj.convert_dtypes(
10021002
infer_objects=False, dtype_backend=self.dtype_backend
10031003
)
@@ -1075,7 +1075,7 @@ def __next__(self) -> DataFrame | Series:
10751075
raise ex
10761076

10771077
if self.dtype_backend is not lib.no_default:
1078-
with option_context("mode.nan_is_na", True):
1078+
with option_context("future.distinguish_nan_and_na", False):
10791079
return obj.convert_dtypes(
10801080
infer_objects=False, dtype_backend=self.dtype_backend
10811081
)

pandas/io/json/_table_schema.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -386,7 +386,7 @@ def parse_table_schema(json, precise_float: bool) -> DataFrame:
386386
'table="orient" can not yet read ISO-formatted Timedelta data'
387387
)
388388

389-
with option_context("mode.nan_is_na", True):
389+
with option_context("future.distinguish_nan_and_na", False):
390390
df = df.astype(dtypes)
391391

392392
if "primaryKey" in table["schema"]:

pandas/io/pytables.py

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2702,8 +2702,12 @@ def convert(self, values: np.ndarray, nan_rep, encoding: str, errors: str):
27022702
# recreate with tz if indicated
27032703
converted = _set_tz(converted, tz, dtype)
27042704

2705-
elif dtype == "timedelta64":
2706-
converted = np.asarray(converted, dtype="m8[ns]")
2705+
elif dtype.startswith("timedelta64"):
2706+
if dtype == "timedelta64":
2707+
# from before we started storing timedelta64 unit
2708+
converted = np.asarray(converted, dtype="m8[ns]")
2709+
else:
2710+
converted = np.asarray(converted, dtype=dtype)
27072711
elif dtype == "date":
27082712
try:
27092713
converted = np.asarray(
@@ -3086,8 +3090,13 @@ def read_array(self, key: str, start: int | None = None, stop: int | None = None
30863090
tz = getattr(attrs, "tz", None)
30873091
ret = _set_tz(ret, tz, dtype)
30883092

3089-
elif dtype == "timedelta64":
3090-
ret = np.asarray(ret, dtype="m8[ns]")
3093+
elif dtype and dtype.startswith("timedelta64"):
3094+
if dtype == "timedelta64":
3095+
# This was written back before we started writing
3096+
# timedelta64 units
3097+
ret = np.asarray(ret, dtype="m8[ns]")
3098+
else:
3099+
ret = np.asarray(ret, dtype=dtype)
30913100

30923101
if transposed:
30933102
return ret.T
@@ -3324,7 +3333,7 @@ def write_array(
33243333
node._v_attrs.value_type = f"datetime64[{value.dtype.unit}]"
33253334
elif lib.is_np_dtype(value.dtype, "m"):
33263335
self._handle.create_array(self.group, key, value.view("i8"))
3327-
getattr(self.group, key)._v_attrs.value_type = "timedelta64"
3336+
getattr(self.group, key)._v_attrs.value_type = str(value.dtype)
33283337
elif isinstance(value, BaseStringArray):
33293338
vlarr = self._handle.create_vlarray(self.group, key, _tables().ObjectAtom())
33303339
vlarr.append(value.to_numpy())
@@ -5175,8 +5184,12 @@ def _unconvert_index(data, kind: str, encoding: str, errors: str) -> np.ndarray
51755184
index = DatetimeIndex(data)
51765185
else:
51775186
index = DatetimeIndex(data.view(kind))
5178-
elif kind == "timedelta64":
5179-
index = TimedeltaIndex(data)
5187+
elif kind.startswith("timedelta64"):
5188+
if kind == "timedelta64":
5189+
# created before we stored resolution information
5190+
index = TimedeltaIndex(data)
5191+
else:
5192+
index = TimedeltaIndex(data.view(kind))
51805193
elif kind == "date":
51815194
try:
51825195
index = np.asarray([date.fromordinal(v) for v in data], dtype=object)
@@ -5413,7 +5426,7 @@ def _dtype_to_kind(dtype_str: str) -> str:
54135426
elif dtype_str.startswith("datetime64"):
54145427
kind = dtype_str
54155428
elif dtype_str.startswith("timedelta"):
5416-
kind = "timedelta64"
5429+
kind = dtype_str
54175430
elif dtype_str.startswith("bool"):
54185431
kind = "bool"
54195432
elif dtype_str.startswith("category"):

pandas/plotting/_matplotlib/converter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1107,7 +1107,7 @@ def __init__(self, unit: TimeUnit = "ns"):
11071107
axis: Axis
11081108

11091109
@staticmethod
1110-
def format_timedelta_ticks(x, pos, n_decimals: int, exp: int) -> str:
1110+
def format_timedelta_ticks(x, pos, n_decimals: int, exp: int = 9) -> str:
11111111
"""
11121112
Convert seconds to 'D days HH:MM:SS.F'
11131113
"""

pandas/tests/io/pytables/test_append.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -848,7 +848,7 @@ def test_append_raise(tmp_path, using_infer_string):
848848
store.append("df", df)
849849

850850

851-
def test_append_with_timedelta(tmp_path):
851+
def test_append_with_timedelta(tmp_path, unit):
852852
# GH 3577
853853
# append timedelta
854854

@@ -860,6 +860,7 @@ def test_append_with_timedelta(tmp_path):
860860
}
861861
)
862862
df["C"] = df["A"] - df["B"]
863+
df["C"] = df["C"].astype(f"m8[{unit}]")
863864
df.loc[3:5, "C"] = np.nan
864865

865866
path = tmp_path / "test_append_with_timedelta.h5"

0 commit comments

Comments
 (0)