Skip to content

Commit 1d92ebc

Browse files
authored
Merge branch 'main' into doc
2 parents 7eff767 + ded274a commit 1d92ebc

File tree

15 files changed

+147
-50
lines changed

15 files changed

+147
-50
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 46 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -552,29 +552,55 @@ small behavior differences as collateral:
552552
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes
553553
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
554554

555-
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
555+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
556+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
557+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
558+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
559+
to :class:`NA` in all cases.
556560

557-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
561+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
562+
and will be treated the same as :class:`NA`. The only change users will see is
563+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
564+
entries produce :class:`NA` entries instead.
558565

559566
*Old behavior:*
560567

561568
.. code-block:: ipython
562569
563-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
570+
# NaN in input gets converted to NA
571+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
572+
In [2]: ser
573+
Out[2]:
574+
0 0.0
575+
1 <NA>
576+
dtype: Float64
577+
# NaN produced by arithmetic (0/0) remained NaN
564578
In [3]: ser / 0
565579
Out[3]:
566580
0 NaN
567581
1 <NA>
568582
dtype: Float64
583+
# the NaN value is not considered as missing
584+
In [4]: (ser / 0).isna()
585+
Out[4]:
586+
0 False
587+
1 True
588+
dtype: bool
569589
570590
*New behavior:*
571591

572592
.. ipython:: python
573593
574-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
594+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
595+
ser
575596
ser / 0
597+
(ser / 0).isna()
576598
577-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
599+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
600+
values, and an option to control this behaviour is added in 3.0 through
601+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
602+
considered distinct and specifically as a floating-point value. As a consequence,
603+
it cannot be used with integer dtypes.
578604

579605
*Old behavior:*
580606

@@ -588,13 +614,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
588614

589615
.. ipython:: python
590616
591-
pd.set_option("mode.nan_is_na", False)
592-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
593-
ser[1]
617+
with pd.option_context("future.distinguish_nan_and_na", True):
618+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
619+
print(ser[1])
620+
621+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
622+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
623+
integer dtype.
594624

595-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
625+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
626+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
627+
:class:`NA` entries are present, where before they would coerce to
628+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
629+
to :meth:`Series.to_numpy`.
596630

597-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
631+
Note that the option is experimental and subject to change in future releases.
598632

599633
The ``__module__`` attribute now points to public modules
600634
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1192,6 +1226,7 @@ MultiIndex
11921226
I/O
11931227
^^^
11941228
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1229+
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``timedelta64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`63239`)
11951230
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
11961231
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
11971232
- Bug in :func:`pandas.json_normalize` inconsistently handling non-dict items in ``data`` when ``max_level`` was set. The function will now raise a ``TypeError`` if ``data`` is a list containing non-dict items (:issue:`62829`)
@@ -1249,6 +1284,7 @@ Plotting
12491284
- Bug in :meth:`Series.plot` preventing a line and bar from being aligned on the same plot (:issue:`61161`)
12501285
- Bug in :meth:`Series.plot` preventing a line and scatter plot from being aligned (:issue:`61005`)
12511286
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
1287+
- Bug in plotting with a :class:`TimedeltaIndex` with non-nanosecond resolution displaying incorrect labels (:issue:`63237`)
12521288

12531289
Groupby/resample/rolling
12541290
^^^^^^^^^^^^^^^^^^^^^^^^

pandas/_config/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,5 @@ def using_string_dtype() -> bool:
3636

3737

3838
def is_nan_na() -> bool:
39-
_mode_options = _global_config["mode"]
40-
return _mode_options["nan_is_na"]
39+
_mode_options = _global_config["future"]
40+
return not _mode_options["distinguish_nan_and_na"]

pandas/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2127,5 +2127,5 @@ def monkeysession():
21272127
@pytest.fixture(params=[True, False])
21282128
def using_nan_is_na(request):
21292129
opt = request.param
2130-
with pd.option_context("mode.nan_is_na", opt):
2130+
with pd.option_context("future.distinguish_nan_and_na", not opt):
21312131
yield opt

pandas/core/computation/pytables.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
Any,
1414
ClassVar,
1515
Self,
16+
cast,
1617
)
1718

1819
import numpy as np
@@ -44,7 +45,10 @@
4445
)
4546

4647
if TYPE_CHECKING:
47-
from pandas._typing import npt
48+
from pandas._typing import (
49+
TimeUnit,
50+
npt,
51+
)
4852

4953

5054
class PyTablesScope(_scope.Scope):
@@ -225,15 +229,19 @@ def stringify(value):
225229
if conv_val.tz is not None:
226230
conv_val = conv_val.tz_convert("UTC")
227231
return TermValue(conv_val, conv_val._value, kind)
228-
elif kind in ("timedelta64", "timedelta"):
232+
elif kind.startswith("timedelta"):
233+
unit = "ns"
234+
if "[" in kind:
235+
unit = cast("TimeUnit", kind.split("[")[-1][:-1])
229236
if isinstance(conv_val, str):
230237
conv_val = Timedelta(conv_val)
231238
elif lib.is_integer(conv_val) or lib.is_float(conv_val):
232239
conv_val = Timedelta(conv_val, unit="s")
233240
else:
234241
conv_val = Timedelta(conv_val)
235-
conv_val = conv_val.as_unit("ns")._value
242+
conv_val = conv_val.as_unit(unit)._value
236243
return TermValue(int(conv_val), conv_val, kind)
244+
237245
elif meta == "category":
238246
metadata = extract_array(self.metadata, extract_numpy=True)
239247
result: npt.NDArray[np.intp] | np.intp | int

pandas/core/config_init.py

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -428,15 +428,6 @@ def is_terminal() -> bool:
428428
validator=is_one_of_factory([True, False, "warn"]),
429429
)
430430

431-
cf.register_option(
432-
"nan_is_na",
433-
os.environ.get("PANDAS_NAN_IS_NA", "1") == "1",
434-
"Whether to treat NaN entries as interchangeable with pd.NA in "
435-
"numpy-nullable and pyarrow float dtypes. See discussion in "
436-
"https://github.com/pandas-dev/pandas/issues/32265",
437-
validator=is_one_of_factory([True, False]),
438-
)
439-
440431

441432
# user warnings
442433
chained_assignment = """
@@ -899,6 +890,18 @@ def register_converter_cb(key: str) -> None:
899890
validator=is_one_of_factory([True, False]),
900891
)
901892

893+
cf.register_option(
894+
"distinguish_nan_and_na",
895+
os.environ.get("PANDAS_FUTURE_DISTINGUISH_NAN_AND_NA", "0") == "1",
896+
"Whether to treat NaN entries as distinct from pd.NA in "
897+
"numpy-nullable and pyarrow float dtypes. By default treats both "
898+
"interchangeable as missing values (NaN will be coerced to NA). "
899+
"See discussion in "
900+
"https://github.com/pandas-dev/pandas/issues/32265",
901+
validator=is_one_of_factory([True, False]),
902+
)
903+
904+
902905
# GH#59502
903906
cf.deprecate_option("future.no_silent_downcasting", Pandas4Warning)
904907
cf.deprecate_option(

pandas/core/indexes/base.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5023,8 +5023,9 @@ def array(self) -> ExtensionArray:
50235023
from pandas.core.arrays.numpy_ import NumpyExtensionArray
50245024

50255025
array = NumpyExtensionArray(array)
5026-
array = array.view()
5027-
array._readonly = True
5026+
# TODO decide on read-only https://github.com/pandas-dev/pandas/issues/63099
5027+
# array = array.view()
5028+
# array._readonly = True
50285029
return array
50295030

50305031
@property

pandas/core/internals/blocks.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2387,7 +2387,9 @@ def external_values(values: ArrayLike) -> ArrayLike:
23872387
values.flags.writeable = False
23882388
else:
23892389
# ExtensionArrays
2390-
values = values.view()
2391-
values._readonly = True
2390+
# TODO decide on read-only https://github.com/pandas-dev/pandas/issues/63099
2391+
# values = values.view()
2392+
# values._readonly = True
2393+
pass
23922394

23932395
return values

pandas/core/series.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -821,8 +821,9 @@ def _references(self) -> BlockValuesRefs:
821821
@property
822822
def array(self) -> ExtensionArray:
823823
arr = self._mgr.array_values()
824-
arr = arr.view()
825-
arr._readonly = True
824+
# TODO decide on read-only https://github.com/pandas-dev/pandas/issues/63099
825+
# arr = arr.view()
826+
# arr._readonly = True
826827
return arr
827828

828829
def __len__(self) -> int:

pandas/io/json/_json.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -997,7 +997,7 @@ def _read_ujson(self) -> DataFrame | Series:
997997
else:
998998
obj = self._get_object_parser(self.data)
999999
if self.dtype_backend is not lib.no_default:
1000-
with option_context("mode.nan_is_na", True):
1000+
with option_context("future.distinguish_nan_and_na", False):
10011001
return obj.convert_dtypes(
10021002
infer_objects=False, dtype_backend=self.dtype_backend
10031003
)
@@ -1075,7 +1075,7 @@ def __next__(self) -> DataFrame | Series:
10751075
raise ex
10761076

10771077
if self.dtype_backend is not lib.no_default:
1078-
with option_context("mode.nan_is_na", True):
1078+
with option_context("future.distinguish_nan_and_na", False):
10791079
return obj.convert_dtypes(
10801080
infer_objects=False, dtype_backend=self.dtype_backend
10811081
)

pandas/io/json/_table_schema.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -386,7 +386,7 @@ def parse_table_schema(json, precise_float: bool) -> DataFrame:
386386
'table="orient" can not yet read ISO-formatted Timedelta data'
387387
)
388388

389-
with option_context("mode.nan_is_na", True):
389+
with option_context("future.distinguish_nan_and_na", False):
390390
df = df.astype(dtypes)
391391

392392
if "primaryKey" in table["schema"]:

0 commit comments

Comments
 (0)