-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Open
Labels
Description
Code Sample, a copy-pastable example
>>> pd.Timestamp('2021-05-20') + pd.tseries.offsets.MonthEnd
Timestamp('2021-05-31 00:00:00')MonthEnd gets me the first second of the last day, while I was expecting the last second of the last day of month (Timestamp('2021-05-31 23:59:59.99999999')). This yields strange consequences as showed below.
A consequence of unexpected behavior
When using df.resample('M').ffill() to get the last known value for a month, data for the last 24h of a month are ignored.
>>> df=pd.DataFrame({'val': {
pd.Timestamp('2021-03-24 12:03:00'): 1.0,
pd.Timestamp('2021-03-31 12:00:00'): 1001.0,
pd.Timestamp('2021-04-28 12:03:00'): 1.0,
pd.Timestamp('2021-04-30 23:59:00'): 1002.0,
pd.Timestamp('2021-05-30 00:00:01'): 1.0,
pd.Timestamp('2021-05-31 12:00:00'): 1003.0
}})
>>> df
val
2021-03-24 12:03:00 1.0
2021-03-31 12:00:00 1001.0
2021-04-28 12:03:00 1.0
2021-04-30 23:59:00 1002.0
2021-05-30 00:00:01 1.0
2021-05-31 12:00:00 1003.0
>>> df.resample('M').ffill()
val
2021-03-31 1.0
2021-04-30 1.0
2021-05-31 1.0The expected behavior for the ffill operation is >1000 for all MonthEnd.
On the other hand, a sum operation works as expected:
>>> df.resample('M').sum()
val
2021-03-31 1002.0
2021-04-30 1003.0
2021-05-31 1004.0Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : 2cb96529396d93b46abab7bbc73a208e708c642e
python : 3.8.2.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.2.4
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : 1.0.2
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: 0.9.0
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.1
sqlalchemy : 1.3.23
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : 0.53.1