TSNE benchmark docs (#122)

inteldimitrius · web-flow · commit 50d1c32bac4c · 2022-12-27T19:26:28.000+03:00
* Add CIFAR_10 dataset loading and available for benchmarking

* Remove line according to PEP8

* Add docs for TSNE benchmark

* Add epsilon dataset to TSNE benchmark
diff --git a/README.md b/README.md
@@ -113,6 +113,7 @@ The configuration of benchmarks allows you to select the frameworks to run, sele
 |**[PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)**|pca|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:x:|
 |**[Ridge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)**|ridge|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:x:|
 |**[SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)**|svm|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:x:|
+|**[TSNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)**|tsne|:white_check_mark:|:x:|:x:|:white_check_mark:|:x:|
 |**[train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)**|train_test_split|:white_check_mark:|:x:|:x:|:white_check_mark:|:x:|
 |**[GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)**|gbt|:x:|:x:|:x:|:x:|:white_check_mark:|
 |**[GradientBoostingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)**|gbt|:x:|:x:|:x:|:x:|:white_check_mark:|
diff --git a/configs/sklearn/performance/tsne.json b/configs/sklearn/performance/tsne.json
@@ -24,15 +24,24 @@
                         "y": "data/mnist_y_test.npy"
                     }
                 },
-		{
-		    "source": "npy",
-		    "name": "cifar_10",
-		    "training":
-		    {
-			"x": "data/cifar_10_x_train.npy",
-			"y": "data/cifar_10_y_train.npy"
-		    }
-		}
+                {
+                    "source": "npy",
+                    "name": "cifar_10",
+                    "training":
+                    {
+                        "x": "data/cifar_10_x_train.npy",
+                        "y": "data/cifar_10_y_train.npy"
+                    }
+                },
+		            {
+                    "source": "npy",
+                    "name": "epsilon_30K",
+                    "training":
+                    {
+                        "x": "data/epsilon_30K_x_train.npy",
+                        "y": "data/epsilon_30K_y_train.npy"
+                    }
+                }
             ],
             "workload-size": "medium"
         }
diff --git a/cuml_bench/README.md b/cuml_bench/README.md
@@ -18,6 +18,7 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 - [PCA](#pca)
 - [Ridge Regression](#ridge)
 - [SVC](#svc)
+- [TSNE](#tsne)
 - [train_test_split](#train_test_split)
 
 #### General
@@ -141,6 +142,17 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | tol | float | 1e-16 | Tolerance passed to sklearn.svm.SVC |
 | probability | action | True | Use probability for SVC |
 
+### TSNE
+
+| parameter Name  | Type | default value | description |
+| ----- | ---- |---- |---- |
+| n-components | int | 2 | Dimension of the embedded space |
+| early-exaggeration | float | 12.0 | This factor increases the attractive forces between points <br/>and allows points to move around more freely finding their nearest neighbors more easily |
+| learning-rate | float | 200.0 | The learning rate for t-SNE is usually in the range [10.0, 1000.0] |
+| angle | float | 0.5 | Angular size. This is the trade-off between speed and accuracy |
+| min-grad-norm | float | 1e-7 | If the gradient norm is below this threshold, the optimization is stopped |
+| random-state | int | 1234 | Determines the random number generator |
+
 #### train_test_split
 
 | parameter Name  | Type | default value | description |
diff --git a/sklearn_bench/README.md b/sklearn_bench/README.md
@@ -24,6 +24,7 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 - [PCA](#pca)
 - [Ridge Regression](#ridge)
 - [SVC](#svc)
+- [TSNE](#tsne)
 - [train_test_split](#train_test_split)
 
 ### General
@@ -152,6 +153,17 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | tol | float | 1e-16 | Tolerance passed to sklearn.svm.SVC |
 | probability | action | True | Use probability for SVC |
 
+### TSNE
+
+| parameter Name  | Type | default value | description |
+| ----- | ---- |---- |---- |
+| n-components | int | 2 | Dimension of the embedded space |
+| early-exaggeration | float | 12.0 | This factor increases the attractive forces between points <br/>and allows points to move around more freely finding their nearest neighbors more easily |
+| learning-rate | float | 200.0 | The learning rate for t-SNE is usually in the range [10.0, 1000.0] |
+| angle | float | 0.5 | Angular size. This is the trade-off between speed and accuracy |
+| min-grad-norm | float | 1e-7 | If the gradient norm is below this threshold, the optimization is stopped |
+| random-state | int | 1234 | Determines the random number generator |
+
 ### train_test_split
 
 | parameter Name  | Type | default value | description |