Skip to content

Commit a741a0e

Browse files
committed
Merge branch 'main' of https://github.com/project-codeflare/codeflare into main
2 parents 94b2df5 + aa6eb47 commit a741a0e

File tree

3 files changed

+20
-24
lines changed

3 files changed

+20
-24
lines changed

docs/source/examples/fit_and_score.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,6 @@ limitations under the License.
2121
We use a sklearn pipeline example Comparing Nearest Neighbors with and without Neighborhood Components Analysis to demonstrate how to define, fit and score multiple classifiers with CodeFlare (CF) Pipelines. The sklearn and CF pipeline notebook is published here [here](https://github.com/project-codeflare/codeflare/blob/main/notebooks/plot_nca_classification.ipynb)
2222
This example plots the class decision boundaries given by a Nearest Neighbors classifier when using the Euclidean distance on the original features, versus using the Euclidean distance after the transformation learned by Neighborhood Components Analysis. Its output is pictorially illustrated with colored decision boundaries like the pictures below.
2323

24-
This example plots the class decision boundaries given by a Nearest Neighbors classifier when using the Euclidean distance on the original features, versus using the Euclidean distance after the transformation learned by Neighborhood Components Analysis. Its output is pictorially illustrated with colored decision boundaries like the pictures below.
25-
2624
![](../images/classification_and_score_1.jpeg)
2725

2826
Classification score and boundaries of KNN with k=1
@@ -97,4 +95,4 @@ Classification score and boundaries of KNN with k=1
9795

9896
Classification score and boundaries of KNN with Neighborhood Component Analysis
9997

100-
The Jupyter notebook of this example is available [here](https://github.com/project-codeflare/codeflare/blob/main/notebooks/plot_nca_classification.ipynb) to demonstrate how one might translate sklearn pipelines to Codeflare pipelines that take advantage of Ray's distributed processing. Please try it out and let us know what you think.
98+
The Jupyter notebook of this example is available [here](https://github.com/project-codeflare/codeflare/blob/main/notebooks/plot_nca_classification.ipynb) to demonstrate how one might translate sklearn pipelines to Codeflare pipelines that take advantage of Ray's distributed processing. Please try it out and let us know what you think.

docs/source/examples/hyperparameter.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@ limitations under the License.
1818

1919
### Tuning hyper-parameters with CodeFlare Pipelines
2020

21-
GridSearchCV() is often used for hyper-parameter turning for a model constructed via sklearn pipelines. It does an exhaustive search over specified parameter values for a pipeline. It implements a `fit()` method and a `score()` method. The parameters of the pipeline used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
21+
`GridSearchCV()` is often used for hyper-parameter turning for a model constructed via sklearn pipelines. It does an exhaustive search over specified parameter values for a pipeline. It implements a `fit()` method and a `score()` method. The parameters of the pipeline used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
2222
Here we show how to convert an example of using `GridSearchCV()` to tune the hyper-parameters of an sklearn pipeline into one that uses Codeflare (CF) pipelines `grid_search_cv()`. We use the [Pipelining: chaining a PCA and a logistic regression](https://scikit-learn.org/stable/auto_examples/compose/plot_digits_pipe.html#sphx-glr-auto-examples-compose-plot-digits-pipe-py) from sklearn pipelines as an example. 
2323

24-
In this sklearn example, a pipeline is chained together with a PCA and a LogisticRegression. The n_components parameter of the PCA and the C parameter of the LogisticRegression are defined in a param_grid: with n_components in `[5, 15, 30, 45, 64]` and `C` defined by `np.logspace(-4, 4, 4)`. A total of 20 combinations of `n_components` and `C` parameter values will be explored by `GridSearchCV()` to find the best one with the highest `mean_test_score`.
24+
In this sklearn example, a pipeline is chained together with a PCA and a LogisticRegression. The `n_components` parameter of the PCA and the `C` parameter of the LogisticRegression are defined in a `param_grid`: with `n_components` in `[5, 15, 30, 45, 64]` and `C` defined by `np.logspace(-4, 4, 4)`. A total of 20 combinations of `n_components` and `C` parameter values will be explored by `GridSearchCV()` to find the best one with the highest `mean_test_score`.
2525

2626
```python
2727
pca = PCA()
@@ -40,14 +40,14 @@ print("Best parameter (CV score=%0.3f):" % search.best_score_)
4040
print(search.best_params_)
4141
```
4242

43-
After running `GridSearchCV().fit()`, the best parameters of `PCA__n_components` and `LogisticRegression__C`, together with the cross-validated mean_test scores are printed out as follows. In this example, the best n_components chosen is 45 for the PCA.
43+
After running `GridSearchCV().fit()`, the best parameters of `PCA__n_components` and `LogisticRegression__C`, together with the cross-validated `mean_test scores` are printed out as follows. In this example, the best `n_components` chosen is 45 for the PCA.
4444

4545
```python
4646
Best parameter (CV score=0.920):
4747
{'logistic__C': 0.046415888336127774, 'pca__n_components': 45}
4848
```
4949

50-
The PCA explained variance ratio and the best n_components chosen are plotted in the top chart. The classification accuracy and its std_test_score are plotted in the bottom chart. The best n_components can be obtained by calling best_estimator_.named_step['pca'].n_components from the returned object of GridSearchCV().
50+
The PCA explained variance ratio and the best `n_components` chosen are plotted in the top chart. The classification accuracy and its `std_test_score` are plotted in the bottom chart. The best `n_components` can be obtained by calling `best_estimator_.named_step['pca'].n_components` from the returned object of `GridSearchCV()`.
5151

5252
![](../images/pca_1.png)
5353

@@ -58,7 +58,7 @@ We next describe the step-by-step conversion of this example to one that uses Co
5858

5959
#### **Step 1: importing codeflare.pipelines packages and ray**
6060

61-
We need to first import various `codeflare.pipelines` packages, including Datamodel and runtime, as well as ray and call `ray.shutdwon()` and `ray.init()`. Note that, in order to run this CodeFlare example notebook, you need to have a running ray instance.
61+
We need to first import various `codeflare.pipelines` packages, including `Datamodel` and `runtime`, as well as `ray` and call `ray.shutdwon()` and `ray.init()`. Note that, in order to run this CodeFlare example notebook, you need to have a running ray instance.
6262

6363
```python
6464
import codeflare.pipelines.Datamodel as dm
@@ -73,7 +73,7 @@ ray.init()
7373

7474
#### **Step 2: defining and setting up a codeflare pipeline**
7575

76-
A codeflare pipeline is defined by EstimatorNodes and edges connecting two EstimatorNodes. In this case, we define node_pca and node_logistic and we connect these two nodes with `pipeline.add_edge()`. Before we can execute `fit()` on a pipeline, we need to set up the proper input to the pipeline.
76+
A codeflare pipeline is defined by `EstimatorNodes` and `edges` connecting two `EstimatorNodes`. In this case, we define `node_pca` and `node_logistic` and we connect these two nodes with `pipeline.add_edge()`. Before we can execute `fit()` on a pipeline, we need to set up the proper input to the pipeline.
7777

7878
```python
7979
pca = PCA()
@@ -88,10 +88,9 @@ pipeline_input = dm.PipelineInput()
8888
pipeline_input.add_xy_arg(node_pca, dm.Xy(X_digits, y_digits))
8989
```
9090

91-
#### **Step 3: defining pipeline param grid and executing**
91+
#### **Step 3: defining pipeline param grid and executing Codeflare pipelines `grid_search_cv()`**
9292

93-
Codeflare pipelines grid_search_cv()
94-
Codeflare pipelines runtime converts an sklearn param_grid into a codeflare pipelines param grid. We also specify the default KFold parameter for running the cross-validation. Finally, Codeflare pipelines runtime executes the grid_search_cv().
93+
Codeflare pipelines runtime converts an sklearn param_grid into a codeflare pipelines param grid. We also specify the default `KFold` parameter for running the cross-validation. Finally, Codeflare pipelines runtime executes the `grid_search_cv()`.
9594

9695
```python
9796
# param_grid
@@ -112,7 +111,7 @@ result = rt.grid_search_cv(kf, pipeline, pipeline_input, pipeline_param)
112111

113112
#### **Step 4: parsing the returned result from `grid_search_cv()`**
114113

115-
As the Codeflare pipelines project is still actively under development, APIs to access some attributes of the explored pipelines in the `grid_search_cv()` are not yet available. As a result, a slightly more verbose code is used to get the best pipeline, its associated parameter values and other statistics from the returned object of `grid_search_cv()`. For example, we need to loop through all the 20 explored pipelines to get the best pipeline. And, to get the n_component of an explored pipeline, we first use `.get_nodes()` on the returned cross-validated pipeline and then use .get_estimator() and then finally use `.get_params()`.
114+
As the Codeflare pipelines project is still actively under development, APIs to access some attributes of the explored pipelines in the `grid_search_cv()` are not yet available. As a result, a slightly more verbose code is used to get the best pipeline, its associated parameter values and other statistics from the returned object of `grid_search_cv()`. For example, we need to loop through all the 20 explored pipelines to get the best pipeline. And, to get the `n_component` of an explored pipeline, we first use `.get_nodes()` on the returned cross-validated pipeline and then use `.get_estimator()` and then finally use `.get_params()`.
116115

117116
```python
118117
import statistics

docs/source/getting_started/starting.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -164,17 +164,16 @@ pip3 install -r requirements.txt
164164
Assuming openshift cluster access from pre-reqs.
165165

166166
a) Create namespace
167-
168-
```
167+
```shell
169168
$ oc create namespace codefalre
170169
namespace/codeflare created
171170
$
172-
```
173-
171+
```
172+
174173
b) Bring up Ray cluster
175-
176-
```
177-
$ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
174+
175+
```
176+
$ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
178177
Cluster: default
179178
180179
Checking Kubernetes environment settings
@@ -248,8 +247,8 @@ pip3 install -r requirements.txt
248247
Connect to a terminal on the cluster head:
249248
ray attach /Users/darroyo/git_workspaces/github.com/ray-project/ray/python/ray/autoscaler/kubernetes/example-full.yaml
250249
Get a remote shell to the cluster manually:
251-
kubectl -n ray exec -it ray-head-ql46b -- bash
252-
```
250+
kubectl -n ray exec -it ray-head-ql46b -- bash
251+
```
253252

254253
3. Verify
255254
a) Check for head node
@@ -263,7 +262,7 @@ pip3 install -r requirements.txt
263262
b) Run example test
264263

265264
```
266-
ray submit python/ray/autoscaler/kubernetes/example-full.yaml x.py
265+
ray submit ray/python/ray/autoscaler/kubernetes/example-full.yaml x.py
267266
Loaded cached provider configuration
268267
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
269268
2021-02-09 08:50:51,028 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python ~/x.py)'
@@ -277,4 +276,4 @@ Jupyter setup demo [Reference repository](https://github.com/erikerlandson/ray-o
277276

278277
### Running examples
279278

280-
Once in a Jupyer envrionment, refer to [notebooks](../../notebooks) for example pipeline. Documentation for reference use cases can be found in [Examples](https://codeflare.readthedocs.io/en/latest/).
279+
Once in a Jupyer envrionment, refer to [notebooks](../../notebooks) for example pipeline. Documentation for reference use cases can be found in [Examples](https://codeflare.readthedocs.io/en/latest/).

0 commit comments

Comments
 (0)