-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
ENH: add basic DataFrame.from_arrow class method for importing through Arrow PyCapsule interface #59696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH: add basic DataFrame.from_arrow class method for importing through Arrow PyCapsule interface #59696
Changes from 4 commits
b63e601
6901e6d
6af237c
fad6bb1
d3b8927
5cccaab
fa4eb11
5ef1f3b
2f6e6de
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -205,6 +205,8 @@ | |
| AnyAll, | ||
| AnyArrayLike, | ||
| ArrayLike, | ||
| ArrowArrayExportable, | ||
| ArrowStreamExportable, | ||
| Axes, | ||
| Axis, | ||
| AxisInt, | ||
|
|
@@ -1746,6 +1748,54 @@ def __rmatmul__(self, other) -> DataFrame: | |
| # ---------------------------------------------------------------------- | ||
| # IO methods (to / from other formats) | ||
|
|
||
| @classmethod | ||
| def from_arrow( | ||
| cls, data: ArrowArrayExportable | ArrowStreamExportable | ||
| ) -> DataFrame: | ||
| """ | ||
| Construct a DataFrame from a tabular Arrow object. | ||
|
|
||
| This function accepts any tabular Arrow object implementing | ||
| the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__`` | ||
| or ``__arrow_c_stream__`` method). | ||
|
|
||
| This function currently relies on ``pyarrow`` to convert the tabular | ||
| object in Arrow format to pandas. | ||
|
|
||
| .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html | ||
|
|
||
| .. versionadded:: 3.0 | ||
|
|
||
| Parameters | ||
| ---------- | ||
| data : pyarrow.Table or Arrow-compatible table | ||
| Any tabular object implementing the Arrow PyCapsule Protocol | ||
| (i.e. has an ``__arrow_c_array__`` or ``__arrow_c_stream__`` | ||
| method). | ||
|
|
||
| Returns | ||
| ------- | ||
| DataFrame | ||
|
|
||
| """ | ||
| pa = import_optional_dependency("pyarrow", min_version="14.0.0") | ||
| if not isinstance(data, pa.Table): | ||
| if not ( | ||
| hasattr(data, "__arrow_c_array__") | ||
| or hasattr(data, "__arrow_c_stream__") | ||
| ): | ||
| # explicitly test this, because otherwise we would accept variour other | ||
| # input types through the pa.table(..) call | ||
| raise TypeError( | ||
| "Expected an Arrow-compatible tabular object (i.e. having an " | ||
| "'_arrow_c_array__' or '__arrow_c_stream__' method), got " | ||
| f"'{type(data).__name__}' instead." | ||
| ) | ||
| data = pa.table(data) | ||
|
||
|
|
||
| df = data.to_pandas() | ||
| return df | ||
|
|
||
| @classmethod | ||
| def from_dict( | ||
| cls, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe this should link to the stream interface page instead? https://arrow.apache.org/docs/format/CStreamInterface.html