Skip to content

Commit 0871741

Browse files
authored
Add Kubeflow Trainer to known users (#18935)
## Which issue does this PR close? No issue ## Rationale for this change Hi Folks, thanks for driving DataFusion forward! We've recently released support for distributed data cache in [Kubeflow Trainer.](https://github.com/kubeflow/trainer) It allows users to stream massive datasets directly to distributed training nodes and optimize GPU utilization. Docs and public talks are available in this guide: https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/ I've updated the DataFusion known users with that. cc @akshaychitneni @bigsur0 @comphead @andygrove ## What changes are included in this PR? Update DataFusion docs to include Kubeflow Trainer. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
1 parent 14f34f6 commit 0871741

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

docs/source/user-guide/introduction.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ Here are some example systems built using DataFusion:
8282
- Streaming data platforms such as [Synnada]
8383
- Tools for reading / sorting / transcoding Parquet, CSV, AVRO, and JSON files such as [qv]
8484
- Native Spark runtime replacement such as [Auron]
85+
- Distributed data cache to boost GPU utilization of AI workloads with [Kubeflow Trainer](https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/)
8586

8687
By using DataFusion, projects are freed to focus on their specific
8788
features, and avoid reimplementing general (but still necessary)
@@ -114,6 +115,8 @@ Here are some active projects using DataFusion:
114115
- [Iceberg-rust](https://github.com/apache/iceberg-rust) Rust implementation of Apache Iceberg
115116
- [InfluxDB] Time Series Database
116117
- [Kamu] Planet-scale streaming data pipeline
118+
- [Kubeflow Trainer](https://github.com/kubeflow/trainer) Kubernetes-native project designed for
119+
scalable LLMs fine-tuning and distributed AI model training.
117120
- [LakeSoul](https://github.com/lakesoul-io/LakeSoul) Open source LakeHouse framework with native IO in Rust.
118121
- [Lance](https://github.com/lancedb/lance) Modern columnar data format for ML
119122
- [OpenObserve] Distributed cloud native observability platform

0 commit comments

Comments
 (0)