Skip to content

DataFrame.update silently does nothing when indices are of differing type #19905

@birdcolour

Description

@birdcolour

Code to reproduce

import numpy as np

df_int = pd.DataFrame(
    {'col': ['foo', 'bar', np.nan]},
    index=[1,2,3]
)
df_obj = pd.DataFrame(
    {'col': [np.nan, np.nan, 'baz']},
    index=['1', '2', '3']
)

print(df_int)
print(df_obj)

# >>>
#    col
# 1  foo
# 2  bar
# 3  NaN
#    col
# 1  NaN
# 2  NaN
# 3  baz

# Note that the indices appear identical, but are actually different dtypes

df_int.update(df_obj)
print(df_int)

# Intended output
# >>>
#    col
# 1  foo
# 2  bar
# 3  baz

# Actual output
# >>>
#      a
# 1  foo
# 2  bar
# 3  NaN

Problem description

Since update compares values of indices, when two dataframes with differing index dtypes are compared, it is possible that no matches are made when this is not the intended behaviour the user expects, and there is no feedback to the user that this has happened. This is particularly surprising when indices appear to be identical, as highlighted above. A warning should be raised to signal that either:

  • tells the user that the indices are not the same type, which may produce some unintended results.
  • states that a type comparison is taking place that will never produce any matches.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions