Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

Commit ea10427

Browse files
committed
Reword README
* Emphasize it's `torch.Tensor`-like. We found this quite important for ML developers to author model preproc. (words copied back from f7f07c9) * Re-org features to emphasize the two high-level product spec: (1) Strong integration with PyTorch and GPU (2) Strong integration with Arrow and variable-width/nested type support. The implementation detail can change.
1 parent 96ef7be commit ea10427

File tree

1 file changed

+7
-11
lines changed

1 file changed

+7
-11
lines changed

README.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,14 @@
1-
# TorchArrow
1+
# TorchArrow: a data processing library for PyTorch
22

3-
**This library is currently in the Beta stage and does not have a stable release. The API may change based on
4-
user feedback or performance. We are committed to bring this library to stable release, but future changes may not be
5-
completely backward compatible. If you have suggestions on the API or use cases you'd like to be covered, please open a
3+
**This library is currently in the Beta stage and does not have a stable release. The API and implementation may change based on
4+
user feedback or performance. Future changes may not be backward compatible.
5+
If you have suggestions on the API or use cases you'd like to be covered, please open a
66
GitHub issue. We'd love to hear thoughts and feedback.**
77

8-
TorchArrow is a machine learning preprocessing library over batch data, providing performant and Pandas-style easy-to-use API for model development. Currently it provides a Python DataFrame that allows extensible UDFs with [Velox](https://github.com/facebookincubator/velox/), with the following features:
8+
TorchArrow is a [torch](https://github.com/pytorch/pytorch).Tensor-like Python DataFrame library for data preprocessing in PyTorch models, with two high-level features:
99

10-
* Seamless handoff with [PyTorch](https://github.com/pytorch/pytorch) or other model authoring, such as Tensor collation and easily plugging into PyTorch DataLoader and [DataPipes](https://github.com/pytorch/data#what-are-datapipes)
11-
* Zero copy for external readers via [Arrow](https://github.com/apache/arrow) in-memory columnar format
12-
* Multiple execution runtimes support:
13-
- High-performance CPU backend via [Velox](https://github.com/facebookincubator/velox/)
14-
- (Future Work) GPU backend via [libcudf](https://docs.rapids.ai/api/libcudf/stable/)
15-
* High-performance C++ UDF support with vectorization
10+
* DataFrame library (like Pandas) with strong GPU or other hardware acceleration (under development) and PyTorch ecosystem integration.
11+
* Columnar memory layout based on [Apache Arrow](https://arrow.apache.org/docs/format/Columnar.html#physical-memory-layout) with strong variable-width and nested data support (such as string, list, map) and Arrow ecosystem integration.
1612

1713
## Installation
1814

0 commit comments

Comments
 (0)