xgboost in python and pyspark (using py4j to call jvm-packages)
xgboost4j version: 0.82
TODO: xgboost4j is not the latest version since 0.90 only supports
python3andspark 2.4
- download
xgboost4j-0.82jar files from xgboost-jars - copy to
pyspark_xgb/jars - rename to
xgboost4j-0.82.jarandxgboost4j-spark-0.82.jarrespectively - set your
SPARK_HOMEandJAVA_HOMEinpyspark/start.sh - [opt] change spark-submit parameters if needed
python version 2.7
- binary logistic
python python_xgb/train_binary.py
- multi classification
python python_xgb/train_multi.py
spark version 2.3.*
- binary logistic
pyspark_xgb/start.sh train_binary.py
- multi classification
pyspark_xgb/start.sh train_multi.py
run the program within docker
it takes some time to build the images ...
cd docker
docker build -t xgb:latest . --no-cache
docker run -i -t xgb:latest /bin/bash
cd xgboost-python-pyspark