-
Notifications
You must be signed in to change notification settings - Fork 682
Open
Labels
bugIncorrect behavior inside of ibisIncorrect behavior inside of ibis
Description
What happened?
import pyarrow as pa
import ibis
t = ibis.memtable({"f": pa.array([1.0, float("nan"), None])})
t.to_pyarrow()
# pyarrow.Table
# f: double
# ----
# f: [[1,null,null]]This is because in
Lines 423 to 448 in f1c888b
| @lazy_singledispatch | |
| def _memtable( | |
| data: Any, | |
| *, | |
| columns: Iterable[str] | None = None, | |
| schema: SchemaLike | None = None, | |
| ) -> Table: | |
| import ibis | |
| if hasattr(data, "__arrow_c_stream__"): | |
| # Support objects exposing arrow's PyCapsule interface | |
| import pyarrow as pa | |
| data = pa.table( | |
| data, | |
| schema=ibis.schema(schema).to_pyarrow() if schema is not None else None, | |
| ) | |
| else: | |
| import pandas as pd | |
| data = pd.DataFrame( | |
| data, | |
| columns=columns | |
| or (ibis.schema(schema).names if schema is not None else None), | |
| ) | |
| return _memtable(data, columns=columns, schema=schema) |
we go through pd.DataFrame, which converts the pyarrow arrays to numpy arrays, losing the distinction between nan and null :(
My naive solution would be "oh lets go pyarrow first, and switch that pd.DataFrame call to pa.table" but I have a feeling that would be super breaking. Probably this needs to be a very test-oriented solution:
- make sure we have really good test coverage for all the cases
- actually try something and verify we don't break anything else
My next idea for implementation would be much more targeted to just this case, checking for isinstance(data, Mapping) and then looking at each individual value to see if it is a pyarrow object, and using pa.table in that case.
What version of ibis are you using?
main
What backend(s) are you using, if any?
NA
Relevant log output
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
bugIncorrect behavior inside of ibisIncorrect behavior inside of ibis
Type
Projects
Status
backlog