Skip to content

bug: memtable(dict[str, pa.Array]) goes through pandas, losing precision #11700

@NickCrews

Description

@NickCrews

What happened?

import pyarrow as pa
import ibis

t = ibis.memtable({"f": pa.array([1.0, float("nan"), None])})
t.to_pyarrow()
# pyarrow.Table
# f: double
# ----
# f: [[1,null,null]]

This is because in

ibis/ibis/expr/api.py

Lines 423 to 448 in f1c888b

@lazy_singledispatch
def _memtable(
data: Any,
*,
columns: Iterable[str] | None = None,
schema: SchemaLike | None = None,
) -> Table:
import ibis
if hasattr(data, "__arrow_c_stream__"):
# Support objects exposing arrow's PyCapsule interface
import pyarrow as pa
data = pa.table(
data,
schema=ibis.schema(schema).to_pyarrow() if schema is not None else None,
)
else:
import pandas as pd
data = pd.DataFrame(
data,
columns=columns
or (ibis.schema(schema).names if schema is not None else None),
)
return _memtable(data, columns=columns, schema=schema)

we go through pd.DataFrame, which converts the pyarrow arrays to numpy arrays, losing the distinction between nan and null :(

My naive solution would be "oh lets go pyarrow first, and switch that pd.DataFrame call to pa.table" but I have a feeling that would be super breaking. Probably this needs to be a very test-oriented solution:

  1. make sure we have really good test coverage for all the cases
  2. actually try something and verify we don't break anything else

My next idea for implementation would be much more targeted to just this case, checking for isinstance(data, Mapping) and then looking at each individual value to see if it is a pyarrow object, and using pa.table in that case.

What version of ibis are you using?

main

What backend(s) are you using, if any?

NA

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions