diff --git a/jupyter/MachineLearning_and_CPLEX.ipynb b/jupyter/MachineLearning_and_CPLEX.ipynb index 3057e1c..d4ce8c4 100644 --- a/jupyter/MachineLearning_and_CPLEX.ipynb +++ b/jupyter/MachineLearning_and_CPLEX.ipynb @@ -8,23 +8,23 @@ "\n", "In 2016, a retail bank sold several products (mortgage account, savings account, and pension account) to its customers.\n", "It kept a record of all historical data, and this data is available for analysis and reuse.\n", - "Following a merger in 2017, the bank has new customers and wants to start some marketing campaigns. \n", + "Following a merger in 2017, the bank has new customers and wants to launch some marketing campaigns. \n", "\n", "The budget for the campaigns is limited. The bank wants to contact a customer and propose only one product.\n", "\n", "\n", "The marketing department needs to decide:\n", " * Who should be contacted?\n", - " * Which product should be proposed? Proposing too many products is counter productive, so only one product per customer contact.\n", + " * Which product should be proposed? (Proposing too many products is counter productive, so only one product will be proposed per customer contact.)\n", " * How will a customer be contacted? There are different ways, with different costs and efficiency.\n", - " * How can they optimally use the limited budget?\n", + " * How can they optimally use their limited budget?\n", " * Will such campaigns be profitable?\n", " \n", "#### Predictive and prescriptive workflow\n", "\n", - "From the historical data, we can train a machine learning product-based classifier on customer profile (age, income, account level, ...) to predict whether a customer would subscribe to a mortgage, savings, or pension account.\n", - "* We can apply this predictive model to the new customers data to predict for each new customer what they will buy.\n", - "* On this new data, we decide which offers are proposed. Which product is offered to which customer through which channel:\n", + "From the historical data, you can train a machine learning product-based classifier on customer profile (age, income, account level, ...) to predict whether a customer would subscribe to a mortgage, savings, or pension account.\n", + "* You can apply this predictive model to the new customer data to predict for each new customer what they will buy.\n", + "* With this new data, you decide which offers are proposed. Which product is offered to which customer through which channel is determined:\n", " * a. with a greedy algorithm that reproduces what a human being would do\n", " * b. using an optimization model wih IBM Decision Optimization.\n", "* The solutions can be displayed, compared, and analyzed.\n", @@ -34,7 +34,7 @@ "\n", "* [Understand the historical data](#Understanding-the-historical-data)\n", "* [Predict the 2017 customer behavior](#Predict-the-2017-customer-behavior)\n", - "* [Get business decisions on the 2017 data](#Get-business-decisions-on-the-2017-data)\n", + "* [Get business decisions for the 2017 data](#Get-business-decisions-on-the-2017-data)\n", "* [Conclusion on the decision making](#Conclusion)" ] }, @@ -54,10 +54,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The purpose of this Notebook is not to provide a perfect machine learning model nor a perfect optimization model.\n", - "The purpose is to show how easy it is to mix machine learning and CPLEX data transformations by doing a forecast, then getting fast and reliable decisions on this new data. \n", + "The purpose of this notebook is to show how easy it is to mix machine learning and CPLEX data transformations by doing a forecast, then getting fast and reliable decisions on this new data. \n", "\n", - "This notebook takes some time to run because multiple optimization models are solved and compared in the part dedicated to what-if analysis. The time it takes depends on your subscription type, which determines what optimization service configuration is used." + "This notebook can take some time to run because multiple optimization models are solved and compared in the part dedicated to what-if analysis. The time it takes depends on your subscription type, which determines what optimization service configuration is used." ] }, { @@ -394,16 +393,16 @@ "name": "stdout", "output_type": "stream", "text": [ - "We have 1650 clients who bought several products\n", - "We have 123 clients who bought all the products\n" + "You have 1650 clients who bought several products\n", + "You have 123 clients who bought all the products\n" ] } ], "source": [ "abc = known_behaviors[known_behaviors.nb_products > 1]\n", - "print(\"We have %d clients who bought several products\" %len(abc))\n", + "print(\"You have %d clients who bought several products\" %len(abc))\n", "abc = known_behaviors[known_behaviors.nb_products == 3]\n", - "print(\"We have %d clients who bought all the products\" %len(abc))" + "print(\"You have %d clients who bought all the products\" %len(abc))" ] }, { @@ -421,14 +420,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Do some visual analysis of the historical data" + "##### Provide some visual analysis of the historical data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "It's possible to use pandas plotting capabilities, but it would require a new version of it. This Notebook relies on matplotlib as it is present everywhere." + "It's possible to use pandas plotting capabilities, but that would require a new version of it. This notebook relies on matplotlib as it is commonly used." ] }, { @@ -489,9 +488,9 @@ "metadata": {}, "source": [ "### Understanding the 2016 customers\n", - "We can see that:\n", - " * The greater a customer's income, the more likely it is s/he will buy a savings account.\n", - " * The older a customer is, the more likely it is s/he will buy a pension account.\n", + "You can see that:\n", + " * The greater a customer's income, the more likely it is he or she will buy a savings account.\n", + " * The older a customer is, the more likely it is he or she will buy a pension account.\n", " * There is a correlation between the number of people in a customer's household, the number of loan accounts held by the customer, and the likelihood a customer buys a mortgage account. To see the correlation, look at the upper right and lower left corners of the mortgage chart." ] }, @@ -536,7 +535,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's use the following columns as machine-learning features:" + "Use the following columns as machine-learning features:" ] }, { @@ -658,7 +657,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We use a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C." + "You are using a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C." ] }, { @@ -694,8 +693,8 @@ "source": [ "### New customer data and predictions\n", "\n", - "Load new customer data, predict behaviors using trained classifier, and do some visual analysis.\n", - "We have all the characteristics of the new customers, as for the 2016 clients, but the new customers did not buy any product yet.\n", + "Load new customer data, predict behaviors using a trained classifier, and perform some visual analysis.\n", + "You have all the characteristics of the new customers, as for the 2016 clients, but the new customers have not yet bought any product.\n", "\n", "##### Load new customer data" ] @@ -921,7 +920,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Do some visual analysis of the predicted data" + "##### Perform some visual analysis of the predicted data" ] }, { @@ -1002,8 +1001,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "We predicted that 112 clients would buy more than one product\n", - "We predicted that 0 clients would buy all three products\n" + "It's predicted that 112 clients would buy more than one product\n", + "It's predicted that 0 clients would buy all three products\n" ] } ], @@ -1011,9 +1010,9 @@ "to_predict[\"nb_products\"] = to_predict.Mortgage + to_predict.Pension + to_predict.Savings\n", "\n", "abc = to_predict[to_predict.nb_products > 1]\n", - "print(\"We predicted that %d clients would buy more than one product\" %len(abc))\n", + "print(\"It's predicted that %d clients would buy more than one product\" %len(abc))\n", "abc = to_predict[to_predict.nb_products == 3]\n", - "print(\"We predicted that %d clients would buy all three products\" %len(abc))" + "print(\"It's predicted that %d clients would buy all three products\" %len(abc))" ] }, { @@ -1021,9 +1020,9 @@ "metadata": {}, "source": [ "## Remarks on the prediction\n", - "The goal is to contact the customers to sell them only one product, so we cannot select all of them.\n", - "This increases the complexity of the problem: we need to determine the best contact channel, but also need to select which product will be sold to a given customer. \n", - "It may be hard to compute this. In order to check, we will use two techniques:\n", + "The goal is to contact the customers to sell them only one product, so you cannot select all of them.\n", + "This increases the complexity of the problem: you need to determine the best contact channel, but also need to select which product will be sold to a given customer. \n", + "It may be hard to compute this. In order to check, you will use two techniques:\n", " * a greedy algorithm\n", " * CPLEX, the IBM leading optimization solver." ] @@ -1043,12 +1042,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Get business decisions on the 2017 data\n", + "# Get business decisions for the 2017 data\n", "## Assign campaigns to customers\n", "\n", - "* We have predicted who will buy what in the list of new customers.\n", - "* However, we do not have the budget to contact all of them. We have various contact channels with different costs and effectiveness.\n", - "* Furthermore, if we contact somebody, we don't want to frustrate them by proposing multiple products; we want to propose only one product per customer.\n", + "* You have a prediction of who will buy what in the list of new customers.\n", + "* However, you do not have the budget to contact all of them. You have various contact channels with different costs and effectiveness.\n", + "* Furthermore, if you contact a customer, you want to propose only one product per customer.\n", "\n", "##### Some input data for optimization\n" ] @@ -1083,7 +1082,7 @@ "metadata": {}, "source": [ "#### Using a greedy algorithm\n", - "* We create a custom algorithm that ensures 10% of offers are made per channel by choosing the most promising per channel. The algorithm then continues to add offers until the budget is reached." + "* You are creating a custom algorithm that ensures 10% of offers are made per channel by choosing the most promising per channel. The algorithm then continues to add offers until the budget is reached." ] }, { @@ -1251,7 +1250,7 @@ "source": [ "#### Using IBM Decision Optimization CPLEX Modeling for Python\n", "\n", - "Let's create the optimization model to select the best ways to contact customers and stay within the limited budget." + "Create the optimization model to select the best ways to contact customers and stay within the limited budget." ] }, { @@ -1402,7 +1401,7 @@ "source": [ "##### Express the objective\n", "\n", - "We want to maximize expected revenue, so we take into account the predicted behavior of each customer for each product." + "You want to maximize expected revenue, so you take into account the predicted behavior of each customer for each product." ] }, { @@ -1580,7 +1579,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "With the mathematical optimization, we made a better selection of customers." + "With the mathematical optimization, you made a better selection of customers." ] }, { @@ -1871,7 +1870,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Due to the business constraints, we can address a maximum of 1680 customers with a \\$35615 budget.\n", + "Due to the business constraints, you can address a maximum of 1680 customers with a \\$35615 budget.\n", "Any funds available above that amount won't be spent.\n", "The expected revenue is \\$87.1K." ] @@ -1881,8 +1880,8 @@ "metadata": {}, "source": [ "### Dealing with infeasibility\n", - "What about a context where we are in tight financial conditions, and our budget is very low?\n", - "We need to determine the minimum amount of budget needed to adress 1/20 of our customers." + "What about the context where you have tight financial conditions, and our budget is very low?\n", + "You need to determine the minimum amount of budget needed to address 1/20 of our customers." ] }, { @@ -1933,7 +1932,7 @@ " #setting all bool vars to 0 is an easy relaxation, so let's refuse it and force to offer something to 1/3 of the clients\n", " mdl.add_constraint(totaloffers >= len(offers)//20, ctname=\"high\")\n", " \n", - " # solve has failed, we try relaxation, based on constraint names\n", + " # solve has failed, trying relaxation, based on constraint names\n", " # constraints are prioritized according to their names\n", " # if a name contains \"low\", it has priority LOW\n", " # if a ct name contains \"medium\" it has priority MEDIUM\n", @@ -1987,8 +1986,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We need a minimum of 15950\\$ to be able to start a marketing campaign.\n", - "With this minimal budget, we will be able to adress 825 possible clients." + "You need a minimum of 15950\\$ to be able to start a marketing campaign.\n", + "With this minimal budget, you will be able to address 825 possible clients." ] }, { @@ -2005,17 +2004,17 @@ "| Greedy | 50800 | 1123 | 299 | 111 | 713 | 21700 |\n", "| CPLEX | 72600 | 1218 | 381 | 117 | 691 | 25000 |\n", "\n", - "* As you can see, with Decision Optimization, we can safely do this marketing campaign to contact 1218 customers out of the 2756 customers. \n", - "* This will lead to a \\$91.5K revenue, significantly greater than the \\$49.5K revenue given by a greedy algorithm.\n", - "* With a greedy algorithm, we will:\n", + "* As you can see, with Decision Optimization, you can safely use this marketing campaign to contact 1218 customers out of the 2756 customers. \n", + "* This will lead to a \\$72.6K revenue, significantly greater than the \\$50.8K revenue given by a greedy algorithm.\n", + "* With a greedy algorithm, you will:\n", " * be unable to focus on the correct customers (it will select fewer of them), \n", " * spend less of the available budget for a smaller revenue.\n", " * focus on selling savings accounts that have the biggest revenue\n", "\n", "### Marketing campaign analysis\n", - "* We need a minimum of \\$16K to be able to start a valid campaign and we expect it will generate \\$47.5K.\n", + "* You need a minimum of \\$16K to be able to start a valid campaign and you expect it will generate \\$47.5K.\n", "\n", - "* Due to the business constraints, we will be able to address 1680 customers maximum using a budget of \\$36K. Any money above that amount won't be spent. The expected revenue is \\$87K.\n" + "* Due to the business constraints, you will be able to address 1680 customers maximum using a budget of \\$36K. Any money above that amount won't be spent. The expected revenue is \\$87K.\n" ] }, {