Add Kubeflow Trainer to known users (#18935)

andreyvelich · web-flow · commit 087174102593 · 2025-11-25T21:32:36.000Z
## Which issue does this PR close? No issue ## Rationale for this change Hi Folks, thanks for driving DataFusion forward! We've recently released support for distributed data cache in [Kubeflow Trainer.](https://github.com/kubeflow/trainer) It allows users to stream massive datasets directly to distributed training nodes and optimize GPU utilization. Docs and public talks are available in this guide: https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/ I've updated the DataFusion known users with that. cc @akshaychitneni @bigsur0 @comphead @andygrove ## What changes are included in this PR? Update DataFusion docs to include Kubeflow Trainer. ## Are these changes tested?  ## Are there any user-facing changes?
diff --git a/docs/source/user-guide/introduction.md b/docs/source/user-guide/introduction.md
@@ -82,6 +82,7 @@ Here are some example systems built using DataFusion:
 - Streaming data platforms such as [Synnada]
 - Tools for reading / sorting / transcoding Parquet, CSV, AVRO, and JSON files such as [qv]
 - Native Spark runtime replacement such as [Auron]
+- Distributed data cache to boost GPU utilization of AI workloads with [Kubeflow Trainer](https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/)
 
 By using DataFusion, projects are freed to focus on their specific
 features, and avoid reimplementing general (but still necessary)
@@ -114,6 +115,8 @@ Here are some active projects using DataFusion:
 - [Iceberg-rust](https://github.com/apache/iceberg-rust) Rust implementation of Apache Iceberg
 - [InfluxDB] Time Series Database
 - [Kamu] Planet-scale streaming data pipeline
+- [Kubeflow Trainer](https://github.com/kubeflow/trainer) Kubernetes-native project designed for
+  scalable LLMs fine-tuning and distributed AI model training.
 - [LakeSoul](https://github.com/lakesoul-io/LakeSoul) Open source LakeHouse framework with native IO in Rust.
 - [Lance](https://github.com/lancedb/lance) Modern columnar data format for ML
 - [OpenObserve] Distributed cloud native observability platform