-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
mainhere
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.Series.mask.html#pandas.Series.mask
Similar issues exist for DataFrame.mask, Series.where, and DataFrame.where; they appear to use the same docstring with replacements.
Documentation problem
In this passage:
The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if
condisFalsethe element is used; otherwise the corresponding element from the DataFrameotheris used. If the axis ofotherdoes not align with axis ofcondSeries/DataFrame, the misaligned index positions will be filled with True.
The bolded sentence is not correct. Here is an example where the other value is not aligned to cond, because the d value in cond has no match in other. However, cond is still not filled with True.
import pandas as pd
a = pd.Series(['apple', 'banana', 'cherry', 'dango'], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
other = pd.Series(['asparagus', 'broccoli', 'carrot', 'dill'], index=['a', 'b', 'c', 'D'])
cond = b.lt(3)
print("Cond matches other?", cond.index == other.index)
print("Cond matches self?", cond.index == a.index)
a.mask(cond, other)
Output:
Cond matches other? [ True True True False]
Cond matches self? [ True True True True]
a asparagus
b broccoli
c cherry
d dango
In this example, you can see that even though cond's d index has no corresponding aligned element in other, it still does not make a replacement for item d - not even to replace it with an NA value.
Rather, the alignment is done between self and cond, not other and cond.
Suggested fix for documentation
Proposed fix:
The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if
condisFalsethe element is used; otherwise the corresponding element from the DataFrameotheris used. If the axis ofselfdoes not align with axis ofcondSeries/DataFrame, the misaligned index positions will be filled with True.
Here is an example which shows that this is correct. In the following code, cond and self are not aligned. The unaligned value in cond is treated as True.
import pandas as pd
a = pd.Series(['apple', 'banana', 'cherry', 'dango'], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'D'])
other = pd.Series(['asparagus', 'broccoli', 'carrot', 'dill'], index=['a', 'b', 'c', 'd'])
cond = b.lt(3)
print("Cond matches other?", cond.index == other.index)
print("Cond matches self?", cond.index == a.index)
a.mask(cond, other)
Output:
Cond matches other? [ True True True False]
Cond matches self? [ True True True False]
a asparagus
b broccoli
c cherry
d dill
dtype: object