Skip to content
This repository was archived by the owner on Dec 9, 2022. It is now read-only.

Commit b6aef2a

Browse files
authored
Update README.md
1 parent fdeedfe commit b6aef2a

File tree

1 file changed

+35
-35
lines changed

1 file changed

+35
-35
lines changed

README.md

Lines changed: 35 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ When the reader has completed this code pattern, they will understand how to :
1313

1414
# Architecture Diagram
1515

16-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/architecture.png)
16+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/architecture.png)
1717

1818
1. User logs into Watson Studio, creates a project and initiates an instance of Auto AI & Object Storage.
1919
2. User uploads the data file in the CSV format to the object storage.
@@ -69,60 +69,60 @@ Sign up for IBM's [Watson Studio](http://dataplatform.ibm.com/).
6969

7070
Click on New Project and select per below.
7171

72-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/create_prj.png)
72+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/create_prj.png)
7373

7474
Define the project by giving a Name and hit 'Create'.
7575

76-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/def_prj.png)
76+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/def_prj.png)
7777

7878
## 3. Add Data
7979

80-
[Clone this repo](https://github.com/RK-Sharath/predict-fraud-using-auto-ai)
81-
Navigate to [data](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/tree/master/data) and save the file on the disk. Review the data glossary from the data folder for more details. `Note: Citation is needed to use this dataset for any other projects.`
80+
[Clone this repo](https://github.com/IBM/predict-fraud-using-auto-ai)
81+
Navigate to [data](https://github.com/IBM/predict-fraud-using-auto-ai/tree/master/data) and save the file on the disk. Review the data glossary from the data folder for more details. `Note: Citation is needed to use this dataset for any other projects.`
8282

8383
Click on Assets and select Browse and add the csv file from your file system.
8484

85-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/add_asset.png)
85+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/add_asset.png)
8686

8787
## 4. Add Asset as Auto AI
8888

8989
Click on Add to project and select AutoAI experiment.
9090

91-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/sel_auto_ai.png)
91+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/sel_auto_ai.png)
9292

9393
`Note: The Lite account for AutoAI comes with 50 capacity units per month and AutoAI consumes 20 capacity units per hour.`
9494

9595
## 5. Create and define experiment
9696

9797
Click on New AutoAI experiment and give a name to the experiment.
9898

99-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/def_exp.png)
99+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/def_exp.png)
100100

101101
Click on Associate a Machine Learning service instance to this project and select the Machine Learning service instance and hit reload. If you do not have Machine Learning service instance, then follow the steps to get one.
102102

103103
The Create button at the bottom right gets highlighted, go ahead and hit Create.
104104

105-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/create_exp.png)
105+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/create_exp.png)
106106

107107
## 6. Import the csv file
108108

109109
We need to import the csv file into the experiment. Note that, only csv file format is supported in AutoAI. Click on Browse or Select from project to choose the fraud_dataset.csv file to import.
110110

111-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/import_csv.png)
111+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/import_csv.png)
112112

113113
## 7. Run experiment
114114

115115
We have to select the target variable, in this case it is Fraud_Risk. Notice that Prediction Type and Optimized Metric get highlighted which tells us that we are working on Binary Classification use case and the evaluation metric is ROC (Receiver Operating Characteristics) & AUC (Area Under The Curve) which is used for classification usecases.
116116

117-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/sel_target.png)
117+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/sel_target.png)
118118

119119
We can click on experiment settings to adjust the holdout sample and training sample under source settings.
120120

121-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/sample_split.png)
121+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/sample_split.png)
122122

123123
We can click on prediction setting to modify the Prediction type, Positive Class & Optimized metric if required. In this case, we will leave'em as is and hit save and close.
124124

125-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/prediction_settings.png)
125+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/prediction_settings.png)
126126

127127
Click on `Run experiment`.
128128

@@ -132,61 +132,61 @@ The AutoAI experiment has been completed in 97 seconds to generate four pipeline
132132

133133
Each pipeline is run with different parameters, pipeline 3 is run on a sequence of HPO (hyper parameters optimization) & FE (feature engineering) where as pipeline 4 includes HPO (hyper parameters optimization), FE (feature engineering) and a combination of both. All these are done on the fly! Isn't it amazing that we just have to sit and watch while AutoAI takes care of things for us and generates awesome machine learning models!! There's very minimal intervention required to get things going and in no time we have the generated pipelines to choose from.
134134

135-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/generate_pipelines.png)
135+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/generate_pipelines.png)
136136

137137
Click on pipeline 3 (which is ranked 1) to see the evaluation metrics on the left side.
138138

139-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/pipeline_3.png)
139+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/pipeline_3.png)
140140

141141
Click on model evaluation to review the performance of the model on the hold out sample and cross validation score. We can observe that our model has done very well by scoring > 95% on Recall, average Precision scores & Area under the curve scores. These scores also mean that our model is able to remember and identify fraudulent transactions with great precision.
142142

143-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/model_eval.png)
143+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/model_eval.png)
144144

145145
Click on feature importance to identify the significant features influencing the outcome. Any variable which starts with Newfeature is a variable generated on the fly by the model as part of feature engineering.
146146

147-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/feature_imp.png)
147+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/feature_imp.png)
148148

149149
Click on feature transforms to understand the transformation of original features to new features. Feature engineering is one of the important factors in the model building process which has a direct impact on the overall accuracy of the model. We can observe that total features are 24 where as the original dataset had 13 variables which means 11 new features have been created by AutoAI which is one of the reasons for high accuracy of the model.
150150

151-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/feature_transforms.png)
151+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/feature_transforms.png)
152152

153153
After all the analysis of model performance, its time to select the model for deployment. We will go ahead and select `pipeline 3 which is Rank 1` and hit on Save as model. `We can select any of the pipelines to be saved which has highest Accuracy or any other evaluation metrics.`
154154

155-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/select_model.png)
155+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/select_model.png)
156156

157157
## 9. Deploy to Cloud
158158

159159
The saved model can be found under `Models` under the project in Watson Studio. Click on three dots on the right side below Actions and hit `Deploy.`
160160

161-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/deploy_mdl.png)
161+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/deploy_mdl.png)
162162

163163
In the next step, click on Add Deployment on the right side above `Actions.`
164164

165-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/add_dply.png)
165+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/add_dply.png)
166166

167167
Define the deployment by giving a name and hit `Save.` Note that, the model will get deployed as web service as a ReST API.
168168

169-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/crt_dply.png)
169+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/crt_dply.png)
170170

171171
The deployment will get initialized and the status will show as `ready` when it is complete.
172172

173-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/dply_ready.png)
173+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/dply_ready.png)
174174

175175
We can click on deployed model to see three tabs, Overview, Implementation and Test. Overview tab will give all details about the deployment like name, type, status et'al. Implementation tab will give scoring endpoint and code snippets to invoke the model. Test tab will give options to test the model.
176176

177-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/overview.png)
177+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/overview.png)
178178

179179
## 10. Model testing
180180

181181
Now that we have created and deployed the model as a web service, how do we test it?? We have to click on `Test` tab which will have two options which are form and json. We can use form if we are to test one record at a time where we can give the values to each attribute manually and hit `Predict` to generate predictions. The output of 0 under `values` indicate that it is a fraudulent transaction. The output can be either 0 or 1 as per the `data glossary` provided in the data folder.
182182

183-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/form.png)
183+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/form.png)
184184

185185
For predicting multiple records, we have to update the values in the json file and use the option to input json data & then hit `Predict` to generate real time predictions.
186186

187-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/json.png)
187+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/json.png)
188188

189-
A sample json file has been provided for testing purpose. The format for scoring the model has to be same as given in json file. Navigate to [data-for-testing](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/tree/master/data-for-testing) and save the file on the disk. Copy and paste the values in the test tab as shown above to generate predictions.
189+
A sample json file has been provided for testing purpose. The format for scoring the model has to be same as given in json file. Navigate to [data-for-testing](https://github.com/IBM/predict-fraud-using-auto-ai/tree/master/data-for-testing) and save the file on the disk. Copy and paste the values in the test tab as shown above to generate predictions.
190190

191191
Go ahead and give it a try on different datasets as per your requirement and realize the ease of creating and deploying models quickly using `AutoAI offering by IBM.`
192192

@@ -209,25 +209,25 @@ Create an account with IBM Cloud and then create a project in Watson Studio. Add
209209
* Select the `From URL` tab.
210210
* Enter a name for the notebook.
211211
* Optionally, enter a description for the notebook.
212-
* Enter this Notebook URL: https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/notebook/Fraud_Detect.ipynb
212+
* Enter this Notebook URL: https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/notebook/Fraud_Detect.ipynb
213213
* Select the runtime (8 vCPU and 32GB RAM)
214214
* Click the `Create` button.
215215

216216
After the notebook is imported, click on `Not Trusted` and select the option as Yes to trust the source of the notebook.
217217

218-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/not_trusted.png)
218+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/not_trusted.png)
219219

220220
`This notebook has been created to demonstrate the steps for building the model using Watson Studio platform. For other usecases, the notebook has to be created from scratch.`
221221

222222
## 5. Insert the data as dataframe
223223

224224
Click on 0010 icon at the top right side which will bring up the data assets tab.
225225

226-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/add.png)
226+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/add.png)
227227

228228
Click on Insert to code dropdown and select the option Insert Pandas Dataframe.
229229

230-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/insert_dataframe.png)
230+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/insert_dataframe.png)
231231

232232
## 6. Run the notebook
233233

@@ -257,19 +257,19 @@ After we run the cells in the notebook which includes data ingestion, data analy
257257

258258
Check the model accuracy and confusion matrix to identify precision and recall scores. We can observe that model has > 92% accuracy on test data and the Precision/Recall scores are also high.
259259

260-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/cf_matrix.png)
260+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/cf_matrix.png)
261261

262262
Feature importance as per the model is below. The model has highlighted some of the attributes which has high impact on the outcome. Features might or might not be fairly compared to access the impact on outcome.
263263

264-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/feat_imp.png)
264+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/feat_imp.png)
265265

266266
We have used shapley values which is a very effective model evaluation technique. Shapley values calculate the importance of a feature by comparing what a model predicts with and without the feature. However, since the order in which a model sees features can affect its predictions, this is done in every possible order, so that the features are fairly compared.
267267

268-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/shap.png)
268+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/shap.png)
269269

270270
We can observe that attributes like Married, Applicant Income & Credit history available are having high impact on the outcome which is to detect fraud as per shapley values.
271271

272-
![](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/images/shap_ft_imp.png)
272+
![](https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/images/shap_ft_imp.png)
273273

274274
With this, we have come to the end of this code pattern where we can compare the ease of using AutoAI to build predictive models vs creating a new jupyter notebook to build and evaluate predictive models. `There's considerable reduction of time in building and deploying the models using AutoAI because it handles missing values, outliers, feature engineering & hyper parameters optimization on the fly and selects the best algorithm as per the dataset.` If you are a developer who wants to build the model quickly and deploy it for being production ready, then AutoAI is for you which can help in taking decisions faster and gives a detailed overview of the data.
275275

0 commit comments

Comments
 (0)