Skip to content

Commit afca02d

Browse files
committed
Merge branch 'main' of https://github.com/pandas-dev/pandas into to_datetime-micros
2 parents 77801e0 + a26efbc commit afca02d

File tree

37 files changed

+2605
-962
lines changed

37 files changed

+2605
-962
lines changed

.github/workflows/docbuild-and-upload.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ jobs:
9393
run: mv doc/build/html web/build/docs
9494

9595
- name: Save website as an artifact
96-
uses: actions/upload-artifact@v4
96+
uses: actions/upload-artifact@v5
9797
with:
9898
name: website
9999
path: web/build

.github/workflows/wheels.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ jobs:
6464
python -m pip install build
6565
python -m build --sdist
6666
67-
- uses: actions/upload-artifact@v4
67+
- uses: actions/upload-artifact@v5
6868
with:
6969
name: sdist
7070
path: ./dist/*
@@ -138,7 +138,7 @@ jobs:
138138
# removes unnecessary files from the release
139139
- name: Download sdist (not macOS)
140140
#if: ${{ matrix.buildplat[1] != 'macosx_*' }}
141-
uses: actions/download-artifact@v5
141+
uses: actions/download-artifact@v6
142142
with:
143143
name: sdist
144144
path: ./dist
@@ -196,7 +196,7 @@ jobs:
196196
shell: bash -el {0}
197197
run: for whl in $(ls wheelhouse); do wheel unpack wheelhouse/$whl -d /tmp; done
198198

199-
- uses: actions/upload-artifact@v4
199+
- uses: actions/upload-artifact@v5
200200
with:
201201
name: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
202202
path: ./wheelhouse/*.whl
@@ -238,11 +238,11 @@ jobs:
238238

239239
steps:
240240
- name: Download all artefacts
241-
uses: actions/download-artifact@v5
241+
uses: actions/download-artifact@v6
242242
with:
243243
path: dist # everything lands in ./dist/**
244244

245-
# TODO: This step can be probably be achieved by actions/download-artifact@v5
245+
# TODO: This step can be probably be achieved by actions/download-artifact@v6
246246
# by specifying merge-multiple: true, and a glob pattern
247247
- name: Collect files
248248
run: |

doc/source/user_guide/io.rst

Lines changed: 1 addition & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -2366,52 +2366,7 @@ Read a URL with no options:
23662366

23672367
The data from the above URL changes every Monday so the resulting data above may be slightly different.
23682368

2369-
Read a URL while passing headers alongside the HTTP request:
2370-
2371-
.. code-block:: ipython
2372-
2373-
In [322]: url = 'https://www.sump.org/notes/request/' # HTTP request reflector
2374-
2375-
In [323]: pd.read_html(url)
2376-
Out[323]:
2377-
[ 0 1
2378-
0 Remote Socket: 51.15.105.256:51760
2379-
1 Protocol Version: HTTP/1.1
2380-
2 Request Method: GET
2381-
3 Request URI: /notes/request/
2382-
4 Request Query: NaN,
2383-
0 Accept-Encoding: identity
2384-
1 Host: www.sump.org
2385-
2 User-Agent: Python-urllib/3.8
2386-
3 Connection: close]
2387-
2388-
In [324]: headers = {
2389-
.....: 'User-Agent':'Mozilla Firefox v14.0',
2390-
.....: 'Accept':'application/json',
2391-
.....: 'Connection':'keep-alive',
2392-
.....: 'Auth':'Bearer 2*/f3+fe68df*4'
2393-
.....: }
2394-
2395-
In [325]: pd.read_html(url, storage_options=headers)
2396-
Out[325]:
2397-
[ 0 1
2398-
0 Remote Socket: 51.15.105.256:51760
2399-
1 Protocol Version: HTTP/1.1
2400-
2 Request Method: GET
2401-
3 Request URI: /notes/request/
2402-
4 Request Query: NaN,
2403-
0 User-Agent: Mozilla Firefox v14.0
2404-
1 AcceptEncoding: gzip, deflate, br
2405-
2 Accept: application/json
2406-
3 Connection: keep-alive
2407-
4 Auth: Bearer 2*/f3+fe68df*4]
2408-
2409-
.. note::
2410-
2411-
We see above that the headers we passed are reflected in the HTTP request.
2412-
2413-
Read in the content of the file from the above URL and pass it to ``read_html``
2414-
as a string:
2369+
Read in HTML content from a file using ``read_html``:
24152370

24162371
.. ipython:: python
24172372

doc/source/whatsnew/v3.0.0.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,7 @@ Other enhancements
219219
- Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`)
220220
- Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`)
221221
- Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`)
222+
- Improve error reporting through outputting the first few duplicates when :func:`merge` validation fails (:issue:`62742`)
222223
- Improve the resulting dtypes in :meth:`DataFrame.where` and :meth:`DataFrame.mask` with :class:`ExtensionDtype` ``other`` (:issue:`62038`)
223224
- Improved deprecation message for offset aliases (:issue:`60820`)
224225
- Many type aliases are now exposed in the new submodule :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)
@@ -956,6 +957,7 @@ Bug fixes
956957

957958
Categorical
958959
^^^^^^^^^^^
960+
- Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
959961
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
960962
- Bug in :func:`testing.assert_index_equal` raising ``TypeError`` instead of ``AssertionError`` for incomparable ``CategoricalIndex`` when ``check_categorical=True`` and ``exact=False`` (:issue:`61935`)
961963
- Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
@@ -1015,6 +1017,7 @@ Numeric
10151017
^^^^^^^
10161018
- Bug in :func:`api.types.infer_dtype` returning "mixed" for complex and ``pd.NA`` mix (:issue:`61976`)
10171019
- Bug in :func:`api.types.infer_dtype` returning "mixed-integer-float" for float and ``pd.NA`` mix (:issue:`61621`)
1020+
- Bug in :meth:`DataFrame.combine_first` where Int64 and UInt64 integers with absolute value greater than ``2**53`` would lose precision after the operation. (:issue:`60128`)
10181021
- Bug in :meth:`DataFrame.corr` where numerical precision errors resulted in correlations above ``1.0`` (:issue:`61120`)
10191022
- Bug in :meth:`DataFrame.cov` raises a ``TypeError`` instead of returning potentially incorrect results or other errors (:issue:`53115`)
10201023
- Bug in :meth:`DataFrame.quantile` where the column type was not preserved when ``numeric_only=True`` with a list-like ``q`` produced an empty result (:issue:`59035`)
@@ -1034,6 +1037,7 @@ Conversion
10341037

10351038
Strings
10361039
^^^^^^^
1040+
- Bug in :meth:`Series.str.replace` raising an error on valid group references (``\1``, ``\2``, etc.) on series converted to PyArrow backend dtype (:issue:`62653`)
10371041
- Bug in :meth:`Series.str.zfill` raising ``AttributeError`` for :class:`ArrowDtype` (:issue:`61485`)
10381042
- Bug in :meth:`Series.value_counts` would not respect ``sort=False`` for series having ``string`` dtype (:issue:`55224`)
10391043
- Bug in multiplication with a :class:`StringDtype` incorrectly allowing multiplying by bools; explicitly cast to integers instead (:issue:`62595`)
@@ -1043,6 +1047,7 @@ Interval
10431047
- :meth:`Index.is_monotonic_decreasing`, :meth:`Index.is_monotonic_increasing`, and :meth:`Index.is_unique` could incorrectly be ``False`` for an ``Index`` created from a slice of another ``Index``. (:issue:`57911`)
10441048
- Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
10451049
- Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
1050+
- Bug in :meth:`IntervalIndex.get_indexer` and :meth:`IntervalIndex.drop` when one of the sides of the index is non-unique (:issue:`52245`)
10461051

10471052
Indexing
10481053
^^^^^^^^
@@ -1149,6 +1154,7 @@ Groupby/resample/rolling
11491154
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby argument ``dropna`` (:issue:`55919`)
11501155
- Bug in :meth:`.DataFrameGroupBy.median` where nat values gave an incorrect result. (:issue:`57926`)
11511156
- Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
1157+
- Bug in :meth:`.DataFrameGroupBy` reductions where non-Boolean values were allowed for the ``numeric_only`` argument; passing a non-Boolean value will now raise (:issue:`62778`)
11521158
- Bug in :meth:`.Resampler.interpolate` on a :class:`DataFrame` with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (:issue:`21351`)
11531159
- Bug in :meth:`.Series.rolling` when used with a :class:`.BaseIndexer` subclass and computing min/max (:issue:`46726`)
11541160
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
@@ -1205,6 +1211,7 @@ ExtensionArray
12051211
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
12061212
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)
12071213
- Bug in various :class:`DataFrame` reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (:issue:`59234`)
1214+
- Fixed flex arithmetic with :class:`ExtensionArray` operands raising when ``fill_value`` was passed. (:issue:`62467`)
12081215

12091216
Styler
12101217
^^^^^^

pandas/_libs/index.pyx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,9 @@ cdef class IndexEngine:
321321
if is_strict_monotonic:
322322
self.unique = 1
323323
self.need_unique_check = 0
324+
elif self.monotonic_inc == 1 or self.monotonic_dec == 1:
325+
self.unique = 0
326+
self.need_unique_check = 0
324327

325328
cdef _call_monotonic(self, values):
326329
return algos.is_monotonic(values, timelike=False)

pandas/conftest.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1447,6 +1447,9 @@ def any_string_dtype(request):
14471447
return pd.StringDtype(storage, na_value)
14481448

14491449

1450+
any_string_dtype2 = any_string_dtype
1451+
1452+
14501453
@pytest.fixture(params=tm.DATETIME64_DTYPES)
14511454
def datetime64_dtype(request):
14521455
"""

pandas/core/arrays/_arrow_string_mixins.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -173,15 +173,12 @@ def _str_replace(
173173
or callable(repl)
174174
or not case
175175
or flags
176-
or (
177-
isinstance(repl, str)
178-
and (r"\g<" in repl or re.search(r"\\\d", repl) is not None)
179-
)
176+
or (isinstance(repl, str) and r"\g<" in repl)
180177
):
181178
raise NotImplementedError(
182179
"replace is not supported with a re.Pattern, callable repl, "
183180
"case=False, flags!=0, or when the replacement string contains "
184-
"named group references (\\g<...>, \\d+)"
181+
"named group references (\\g<...>)"
185182
)
186183

187184
func = pc.replace_substring_regex if regex else pc.replace_substring

pandas/core/arrays/categorical.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -460,6 +460,10 @@ def __init__(
460460
codes = arr.indices.to_numpy()
461461
dtype = CategoricalDtype(categories, values.dtype.pyarrow_dtype.ordered)
462462
else:
463+
preserve_object = False
464+
if isinstance(values, (ABCIndex, ABCSeries)) and values.dtype == object:
465+
# GH#61778
466+
preserve_object = True
463467
if not isinstance(values, ABCIndex):
464468
# in particular RangeIndex xref test_index_equal_range_categories
465469
values = sanitize_array(values, None)
@@ -476,7 +480,14 @@ def __init__(
476480
"by passing in a categories argument."
477481
) from err
478482

479-
# we're inferring from values
483+
if preserve_object:
484+
# GH#61778 wrap categories in an Index to prevent dtype
485+
# inference in the CategoricalDtype constructor
486+
from pandas import Index
487+
488+
categories = Index(categories, dtype=object, copy=False)
489+
490+
# if not preserve_obejct, we're inferring from values
480491
dtype = CategoricalDtype(categories, dtype.ordered)
481492

482493
elif isinstance(values.dtype, CategoricalDtype):

pandas/core/arrays/string_arrow.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -425,8 +425,7 @@ def _str_replace(
425425
or flags
426426
or ( # substitution contains a named group pattern
427427
# https://docs.python.org/3/library/re.html
428-
isinstance(repl, str)
429-
and (r"\g<" in repl or re.search(r"\\\d", repl) is not None)
428+
isinstance(repl, str) and r"\g<" in repl
430429
)
431430
):
432431
return super()._str_replace(pat, repl, n, case, flags, regex)

pandas/core/frame.py

Lines changed: 67 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -3293,28 +3293,71 @@ def to_html(
32933293
Examples
32943294
--------
32953295
>>> df = pd.DataFrame(data={"col1": [1, 2], "col2": [4, 3]})
3296-
>>> html_string = '''<table border="1" class="dataframe">
3297-
... <thead>
3298-
... <tr style="text-align: right;">
3299-
... <th></th>
3300-
... <th>col1</th>
3301-
... <th>col2</th>
3302-
... </tr>
3303-
... </thead>
3304-
... <tbody>
3305-
... <tr>
3306-
... <th>0</th>
3307-
... <td>1</td>
3308-
... <td>4</td>
3309-
... </tr>
3310-
... <tr>
3311-
... <th>1</th>
3312-
... <td>2</td>
3313-
... <td>3</td>
3314-
... </tr>
3315-
... </tbody>
3316-
... </table>'''
3317-
>>> assert html_string == df.to_html()
3296+
>>> html_string = df.to_html()
3297+
>>> print(html_string)
3298+
<table border="1" class="dataframe">
3299+
<thead>
3300+
<tr style="text-align: right;">
3301+
<th></th>
3302+
<th>col1</th>
3303+
<th>col2</th>
3304+
</tr>
3305+
</thead>
3306+
<tbody>
3307+
<tr>
3308+
<th>0</th>
3309+
<td>1</td>
3310+
<td>4</td>
3311+
</tr>
3312+
<tr>
3313+
<th>1</th>
3314+
<td>2</td>
3315+
<td>3</td>
3316+
</tr>
3317+
</tbody>
3318+
</table>
3319+
3320+
HTML output
3321+
3322+
+----+-----+-----+
3323+
| |col1 |col2 |
3324+
+====+=====+=====+
3325+
|0 |1 |4 |
3326+
+----+-----+-----+
3327+
|1 |2 |3 |
3328+
+----+-----+-----+
3329+
3330+
>>> df = pd.DataFrame(data={"col1": [1, 2], "col2": [4, 3]})
3331+
>>> html_string = df.to_html(index=False)
3332+
>>> print(html_string)
3333+
<table border="1" class="dataframe">
3334+
<thead>
3335+
<tr style="text-align: right;">
3336+
<th>col1</th>
3337+
<th>col2</th>
3338+
</tr>
3339+
</thead>
3340+
<tbody>
3341+
<tr>
3342+
<td>1</td>
3343+
<td>4</td>
3344+
</tr>
3345+
<tr>
3346+
<td>2</td>
3347+
<td>3</td>
3348+
</tr>
3349+
</tbody>
3350+
</table>
3351+
3352+
HTML output
3353+
3354+
+-----+-----+
3355+
|col1 |col2 |
3356+
+=====+=====+
3357+
|1 |4 |
3358+
+-----+-----+
3359+
|2 |3 |
3360+
+-----+-----+
33183361
"""
33193362
if justify is not None and justify not in fmt.VALID_JUSTIFY_PARAMETERS:
33203363
raise ValueError("Invalid value for justify parameter")
@@ -9165,20 +9208,10 @@ def combine_first(self, other: DataFrame) -> DataFrame:
91659208
1 0.0 3.0 1.0
91669209
2 NaN 3.0 1.0
91679210
"""
9168-
from pandas.core.computation import expressions
91699211

91709212
def combiner(x: Series, y: Series):
9171-
mask = x.isna()._values
9172-
9173-
x_values = x._values
9174-
y_values = y._values
9175-
9176-
# If the column y in other DataFrame is not in first DataFrame,
9177-
# just return y_values.
9178-
if y.name not in self.columns:
9179-
return y_values
9180-
9181-
return expressions.where(mask, y_values, x_values)
9213+
# GH#60128 The combiner is supposed to preserve EA Dtypes.
9214+
return y if y.name not in self.columns else y.where(x.isna(), x)
91829215

91839216
if len(other) == 0:
91849217
combined = self.reindex(

0 commit comments

Comments
 (0)