Skip to content

Possible bug when parsing from string to datetime when slicing #13929

@tomchor

Description

@tomchor

From this question in SO

Please consider the following example

data = pd.DataFrame(np.random.random((72000,3)), columns=list('uvw'), 
           index=pd.date_range('2013-11-08 10:00:00', periods=72000, freq='50ms'))
data.loc['2013-11-08 10:15:00.000':'2013-11-08 10:17:00.000']

This outputs

                                u         v         w
2013-11-08 10:15:00.000  0.569030  0.850393  0.600106
2013-11-08 10:15:00.050  0.679713  0.933720  0.041018
2013-11-08 10:15:00.100  0.503491  0.142397  0.841705
2013-11-08 10:15:00.150  0.171248  0.545567  0.247094
2013-11-08 10:15:00.200  0.149745  0.149588  0.935516
2013-11-08 10:15:00.250  0.039780  0.097837  0.087254
...                           ...       ...       ...
2013-11-08 10:17:00.700  0.001165  0.020971  0.197322
2013-11-08 10:17:00.750  0.003923  0.722930  0.312988
2013-11-08 10:17:00.800  0.941241  0.600529  0.479640
2013-11-08 10:17:00.850  0.272536  0.738084  0.486551
2013-11-08 10:17:00.900  0.060388  0.606207  0.359640
2013-11-08 10:17:00.950  0.464268  0.965543  0.699740

[2420 rows x 3 columns]

This is weird for me because I expected the last row to be 2013-11-08 10:17:00.000, since that's the end-point I defined. Indeed when I define the endpoint as datetime(2013,11,8,10,17,0,0), which should be identical, it works as I would expect:

In [13]: data.loc['2013-11-08 10:15:00.000':datetime(2013,11,8,10,17,0,0)]
Out[13]: 
                                u         v         w
2013-11-08 10:15:00.000  0.569030  0.850393  0.600106
2013-11-08 10:15:00.050  0.679713  0.933720  0.041018
2013-11-08 10:15:00.100  0.503491  0.142397  0.841705
2013-11-08 10:15:00.150  0.171248  0.545567  0.247094
2013-11-08 10:15:00.200  0.149745  0.149588  0.935516
2013-11-08 10:15:00.250  0.039780  0.097837  0.087254
...                           ...       ...       ...
2013-11-08 10:16:59.750  0.652168  0.606795  0.901583
2013-11-08 10:16:59.800  0.868184  0.249873  0.517637
2013-11-08 10:16:59.850  0.917543  0.303403  0.980257
2013-11-08 10:16:59.900  0.118191  0.032437  0.580734
2013-11-08 10:16:59.950  0.093644  0.017865  0.080326
2013-11-08 10:17:00.000  0.770234  0.310025  0.065127

[2401 rows x 3 columns]

I'm submitting this as a suggestion from an SO user because this seems like a bug.

To complete:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-21-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
IPython: 2.4.1
sphinx: 1.4.5
patsy: None
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions