Skip to content

Commit 40bbdd3

Browse files
authored
Merge branch 'pandas-dev:main' into sort-api-ref-in-alpha-order-2
2 parents f04c0af + 3940df8 commit 40bbdd3

File tree

19 files changed

+155
-28
lines changed

19 files changed

+155
-28
lines changed

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ jobs:
189189
# installing wheel here because micromamba step was skipped
190190
if: matrix.buildplat[1] == 'win_arm64'
191191
shell: bash -el {0}
192-
run: python -m pip install wheel
192+
run: python -m pip install wheel anaconda-client
193193

194194
- name: Validate wheel RECORD
195195
shell: bash -el {0}

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@
1919
**pandas** is a Python package that provides fast, flexible, and expressive data
2020
structures designed to make working with "relational" or "labeled" data both
2121
easy and intuitive. It aims to be the fundamental high-level building block for
22-
doing practical, **real world** data analysis in Python. Additionally, it has
23-
the broader goal of becoming **the most powerful and flexible open source data
24-
analysis / manipulation tool available in any language**. It is already well on
22+
doing practical, **real-world** data analysis in Python. Additionally, it has
23+
the broader goal of becoming **the most powerful and flexible open-source data
24+
analysis/manipulation tool available in any language**. It is already well on
2525
its way towards this goal.
2626

2727
## Table of Contents
@@ -64,7 +64,7 @@ Here are just a few of the things that pandas does well:
6464
data sets
6565
- [**Hierarchical**][mi] labeling of axes (possible to have multiple
6666
labels per tick)
67-
- Robust IO tools for loading data from [**flat files**][flat-files]
67+
- Robust I/O tools for loading data from [**flat files**][flat-files]
6868
(CSV and delimited), [**Excel files**][excel], [**databases**][db],
6969
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
7070
- [**Time series**][timeseries]-specific functionality: date range
@@ -138,7 +138,7 @@ or for installing in [development mode](https://pip.pypa.io/en/latest/cli/pip_in
138138

139139

140140
```sh
141-
python -m pip install -ve . --no-build-isolation -Ceditable-verbose=true
141+
python -m pip install -ve . --no-build-isolation --config-settings editable-verbose=true
142142
```
143143

144144
See the full instructions for [installing from source](https://pandas.pydata.org/docs/dev/development/contributing_environment.html).
@@ -155,7 +155,7 @@ has been under active development since then.
155155

156156
## Getting Help
157157

158-
For usage questions, the best place to go to is [StackOverflow](https://stackoverflow.com/questions/tagged/pandas).
158+
For usage questions, the best place to go to is [Stack Overflow](https://stackoverflow.com/questions/tagged/pandas).
159159
Further, general questions and discussions can also take place on the [pydata mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata).
160160

161161
## Discussion and Development

doc/source/whatsnew/v3.0.0.rst

Lines changed: 101 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,108 @@ including other versions of pandas.
1414
Enhancements
1515
~~~~~~~~~~~~
1616

17-
.. _whatsnew_300.enhancements.enhancement1:
17+
.. _whatsnew_300.enhancements.string_dtype:
1818

19-
Enhancement1
20-
^^^^^^^^^^^^
19+
Dedicated string data type by default
20+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21+
22+
Historically, pandas represented string columns with NumPy ``object`` data type.
23+
This representation has numerous problems: it is not specific to strings (any
24+
Python object can be stored in an ``object``-dtype array, not just strings) and
25+
it is often not very efficient (both performance wise and for memory usage).
26+
27+
Starting with pandas 3.0, a dedicated string data type is enabled by default
28+
(backed by PyArrow under the hood, if installed, otherwise falling back to being
29+
backed by NumPy ``object``-dtype). This means that pandas will start inferring
30+
columns containing string data as the new ``str`` data type when creating pandas
31+
objects, such as in constructors or IO functions.
32+
33+
Old behavior:
34+
35+
.. code-block:: python
36+
37+
>>> ser = pd.Series(["a", "b"])
38+
0 a
39+
1 b
40+
dtype: object
41+
42+
New behavior:
43+
44+
.. code-block:: python
45+
46+
>>> ser = pd.Series(["a", "b"])
47+
0 a
48+
1 b
49+
dtype: str
50+
51+
The string data type that is used in these scenarios will mostly behave as NumPy
52+
object would, including missing value semantics and general operations on these
53+
columns.
54+
55+
The main characteristic of the new string data type:
56+
57+
- Inferred by default for string data (instead of object dtype)
58+
- The ``str`` dtype can only hold strings (or missing values), in contrast to
59+
``object`` dtype. (setitem with non string fails)
60+
- The missing value sentinel is always ``NaN`` (``np.nan``) and follows the same
61+
missing value semantics as the other default dtypes.
62+
63+
Those intentional changes can have breaking consequences, for example when checking
64+
for the ``.dtype`` being object dtype or checking the exact missing value sentinel.
65+
See the :ref:`string_migration_guide` for more details on the behaviour changes
66+
and how to adapt your code to the new default.
67+
68+
.. seealso::
69+
70+
`PDEP-14: Dedicated string data type for pandas 3.0 <https://pandas.pydata.org/pdeps/0014-string-dtype.html>`__
71+
72+
73+
.. _whatsnew_300.enhancements.copy_on_write:
74+
75+
Copy-on-Write
76+
^^^^^^^^^^^^^
77+
78+
The new "copy-on-write" behaviour in pandas 3.0 brings changes in behavior in
79+
how pandas operates with respect to copies and views. A summary of the changes:
80+
81+
1. The result of *any* indexing operation (subsetting a DataFrame or Series in any way,
82+
i.e. including accessing a DataFrame column as a Series) or any method returning a
83+
new DataFrame or Series, always *behaves as if* it were a copy in terms of user
84+
API.
85+
2. As a consequence, if you want to modify an object (DataFrame or Series), the only way
86+
to do this is to directly modify that object itself.
87+
88+
The main goal of this change is to make the user API more consistent and
89+
predictable. There is now a clear rule: *any* subset or returned
90+
series/dataframe **always** behaves as a copy of the original, and thus never
91+
modifies the original (before pandas 3.0, whether a derived object would be a
92+
copy or a view depended on the exact operation performed, which was often
93+
confusing).
94+
95+
Because every single indexing step now behaves as a copy, this also means that
96+
"chained assignment" (updating a DataFrame with multiple setitem steps) will
97+
stop working. Because this now consistently never works, the
98+
``SettingWithCopyWarning`` is removed.
99+
100+
The new behavioral semantics are explained in more detail in the
101+
:ref:`user guide about Copy-on-Write <copy_on_write>`.
102+
103+
A secondary goal is to improve performance by avoiding unnecessary copies. As
104+
mentioned above, every new DataFrame or Series returned from an indexing
105+
operation or method *behaves* as a copy, but under the hood pandas will use
106+
views as much as possible, and only copy when needed to guarantee the "behaves
107+
as a copy" behaviour (this is the actual "copy-on-write" mechanism used as an
108+
implementation detail).
109+
110+
Some of the behaviour changes described above are breaking changes in pandas
111+
3.0. When upgrading to pandas 3.0, it is recommended to first upgrade to pandas
112+
2.3 to get deprecation warnings for a subset of those changes. The
113+
:ref:`migration guide <copy_on_write.migration_guide>` explains the upgrade
114+
process in more detail.
115+
116+
.. seealso::
117+
118+
`PDEP-7: Consistent copy/view semantics in pandas with Copy-on-Write <https://pandas.pydata.org/pdeps/0007-copy-on-write.html>`__
21119

22120
.. _whatsnew_300.enhancements.enhancement2:
23121

pandas/_config/config.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -693,8 +693,8 @@ def _get_registered_option(key: str):
693693

694694
def _translate_key(key: str) -> str:
695695
"""
696-
if key id deprecated and a replacement key defined, will return the
697-
replacement key, otherwise returns `key` as - is
696+
if `key` is deprecated and a replacement key defined, will return the
697+
replacement key, otherwise returns `key` as-is
698698
"""
699699
d = _get_deprecated_option(key)
700700
if d:

pandas/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -581,7 +581,7 @@ def render_git_describe(pieces):
581581
def render_git_describe_long(pieces):
582582
"""TAG-DISTANCE-gHEX[-dirty].
583583
584-
Like 'git describe --tags --dirty --always -long'.
584+
Like 'git describe --tags --dirty --always --long'.
585585
The distance/hash is unconditional.
586586
587587
Exceptions:

pandas/core/accessor.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def _add_delegate_accessors(
8888
cls
8989
Class to add the methods/properties to.
9090
delegate
91-
Class to get methods/properties and doc-strings.
91+
Class to get methods/properties and docstrings.
9292
accessors : list of str
9393
List of accessors to add.
9494
typ : {'property', 'method'}
@@ -159,7 +159,7 @@ def delegate_names(
159159
Parameters
160160
----------
161161
delegate : object
162-
The class to get methods/properties & doc-strings.
162+
The class to get methods/properties & docstrings.
163163
accessors : Sequence[str]
164164
List of accessor to add.
165165
typ : {'property', 'method'}

pandas/core/arrays/boolean.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -378,7 +378,7 @@ def _logical_method(self, other, op): # type: ignore[override]
378378
elif is_list_like(other):
379379
other = np.asarray(other, dtype="bool")
380380
if other.ndim > 1:
381-
raise NotImplementedError("can only perform ops with 1-d structures")
381+
return NotImplemented
382382
other, mask = coerce_to_array(other, copy=False)
383383
elif isinstance(other, np.bool_):
384384
other = other.item()

pandas/core/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@
9090

9191
class PandasObject(DirNamesMixin):
9292
"""
93-
Baseclass for various pandas objects.
93+
Base class for various pandas objects.
9494
"""
9595

9696
# results from calls to methods decorated with cache_readonly get added to _cache

pandas/core/generic.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10216,6 +10216,7 @@ def shift(
1021610216
suffix : str, optional
1021710217
If str and periods is an iterable, this is added after the column
1021810218
name and before the shift value for each shifted column name.
10219+
For `Series` this parameter is unused and defaults to `None`.
1021910220
1022010221
Returns
1022110222
-------

pandas/core/indexing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1926,7 +1926,7 @@ def _setitem_with_indexer(self, indexer, value, name: str = "iloc") -> None:
19261926
labels = index.insert(len(index), key)
19271927

19281928
# We are expanding the Series/DataFrame values to match
1929-
# the length of thenew index `labels`. GH#40096 ensure
1929+
# the length of the new index `labels`. GH#40096 ensure
19301930
# this is valid even if the index has duplicates.
19311931
taker = np.arange(len(index) + 1, dtype=np.intp)
19321932
taker[-1] = -1

0 commit comments

Comments
 (0)