You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 9, 2022. It is now read-only.
[Clone this repo](https://github.com/RK-Sharath/predict-fraud-using-auto-ai)
81
-
Navigate to [data](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/tree/master/data) and save the file on the disk. Review the data glossary from the data folder for more details. `Note: Citation is needed to use this dataset for any other projects.`
80
+
[Clone this repo](https://github.com/IBM/predict-fraud-using-auto-ai)
81
+
Navigate to [data](https://github.com/IBM/predict-fraud-using-auto-ai/tree/master/data) and save the file on the disk. Review the data glossary from the data folder for more details. `Note: Citation is needed to use this dataset for any other projects.`
82
82
83
83
Click on Assets and select Browse and add the csv file from your file system.
Click on Associate a Machine Learning service instance to this project and select the Machine Learning service instance and hit reload. If you do not have Machine Learning service instance, then follow the steps to get one.
102
102
103
103
The Create button at the bottom right gets highlighted, go ahead and hit Create.
We need to import the csv file into the experiment. Note that, only csv file format is supported in AutoAI. Click on Browse or Select from project to choose the fraud_dataset.csv file to import.
We have to select the target variable, in this case it is Fraud_Risk. Notice that Prediction Type and Optimized Metric get highlighted which tells us that we are working on Binary Classification use case and the evaluation metric is ROC (Receiver Operating Characteristics) & AUC (Area Under The Curve) which is used for classification usecases.
We can click on prediction setting to modify the Prediction type, Positive Class & Optimized metric if required. In this case, we will leave'em as is and hit save and close.
@@ -132,61 +132,61 @@ The AutoAI experiment has been completed in 97 seconds to generate four pipeline
132
132
133
133
Each pipeline is run with different parameters, pipeline 3 is run on a sequence of HPO (hyper parameters optimization) & FE (feature engineering) where as pipeline 4 includes HPO (hyper parameters optimization), FE (feature engineering) and a combination of both. All these are done on the fly! Isn't it amazing that we just have to sit and watch while AutoAI takes care of things for us and generates awesome machine learning models!! There's very minimal intervention required to get things going and in no time we have the generated pipelines to choose from.
Click on model evaluation to review the performance of the model on the hold out sample and cross validation score. We can observe that our model has done very well by scoring > 95% on Recall, average Precision scores & Area under the curve scores. These scores also mean that our model is able to remember and identify fraudulent transactions with great precision.
Click on feature importance to identify the significant features influencing the outcome. Any variable which starts with Newfeature is a variable generated on the fly by the model as part of feature engineering.
Click on feature transforms to understand the transformation of original features to new features. Feature engineering is one of the important factors in the model building process which has a direct impact on the overall accuracy of the model. We can observe that total features are 24 where as the original dataset had 13 variables which means 11 new features have been created by AutoAI which is one of the reasons for high accuracy of the model.
After all the analysis of model performance, its time to select the model for deployment. We will go ahead and select `pipeline 3 which is Rank 1` and hit on Save as model. `We can select any of the pipelines to be saved which has highest Accuracy or any other evaluation metrics.`
We can click on deployed model to see three tabs, Overview, Implementation and Test. Overview tab will give all details about the deployment like name, type, status et'al. Implementation tab will give scoring endpoint and code snippets to invoke the model. Test tab will give options to test the model.
Now that we have created and deployed the model as a web service, how do we test it?? We have to click on `Test` tab which will have two options which are form and json. We can use form if we are to test one record at a time where we can give the values to each attribute manually and hit `Predict` to generate predictions. The output of 0 under `values` indicate that it is a fraudulent transaction. The output can be either 0 or 1 as per the `data glossary` provided in the data folder.
For predicting multiple records, we have to update the values in the json file and use the option to input json data & then hit `Predict` to generate real time predictions.
A sample json file has been provided for testing purpose. The format for scoring the model has to be same as given in json file. Navigate to [data-for-testing](https://github.com/RK-Sharath/predict-fraud-using-auto-ai/tree/master/data-for-testing) and save the file on the disk. Copy and paste the values in the test tab as shown above to generate predictions.
189
+
A sample json file has been provided for testing purpose. The format for scoring the model has to be same as given in json file. Navigate to [data-for-testing](https://github.com/IBM/predict-fraud-using-auto-ai/tree/master/data-for-testing) and save the file on the disk. Copy and paste the values in the test tab as shown above to generate predictions.
190
190
191
191
Go ahead and give it a try on different datasets as per your requirement and realize the ease of creating and deploying models quickly using `AutoAI offering by IBM.`
192
192
@@ -209,25 +209,25 @@ Create an account with IBM Cloud and then create a project in Watson Studio. Add
209
209
* Select the `From URL` tab.
210
210
* Enter a name for the notebook.
211
211
* Optionally, enter a description for the notebook.
212
-
* Enter this Notebook URL: https://github.com/RK-Sharath/predict-fraud-using-auto-ai/blob/master/notebook/Fraud_Detect.ipynb
212
+
* Enter this Notebook URL: https://github.com/IBM/predict-fraud-using-auto-ai/blob/master/notebook/Fraud_Detect.ipynb
213
213
* Select the runtime (8 vCPU and 32GB RAM)
214
214
* Click the `Create` button.
215
215
216
216
After the notebook is imported, click on `Not Trusted` and select the option as Yes to trust the source of the notebook.
`This notebook has been created to demonstrate the steps for building the model using Watson Studio platform. For other usecases, the notebook has to be created from scratch.`
221
221
222
222
## 5. Insert the data as dataframe
223
223
224
224
Click on 0010 icon at the top right side which will bring up the data assets tab.
@@ -257,19 +257,19 @@ After we run the cells in the notebook which includes data ingestion, data analy
257
257
258
258
Check the model accuracy and confusion matrix to identify precision and recall scores. We can observe that model has > 92% accuracy on test data and the Precision/Recall scores are also high.
Feature importance as per the model is below. The model has highlighted some of the attributes which has high impact on the outcome. Features might or might not be fairly compared to access the impact on outcome.
We have used shapley values which is a very effective model evaluation technique. Shapley values calculate the importance of a feature by comparing what a model predicts with and without the feature. However, since the order in which a model sees features can affect its predictions, this is done in every possible order, so that the features are fairly compared.
We can observe that attributes like Married, Applicant Income & Credit history available are having high impact on the outcome which is to detect fraud as per shapley values.
With this, we have come to the end of this code pattern where we can compare the ease of using AutoAI to build predictive models vs creating a new jupyter notebook to build and evaluate predictive models. `There's considerable reduction of time in building and deploying the models using AutoAI because it handles missing values, outliers, feature engineering & hyper parameters optimization on the fly and selects the best algorithm as per the dataset.` If you are a developer who wants to build the model quickly and deploy it for being production ready, then AutoAI is for you which can help in taking decisions faster and gives a detailed overview of the data.
0 commit comments