diff --git a/jupyter/MachineLearning_and_CPLEX.ipynb b/jupyter/MachineLearning_and_CPLEX.ipynb
index 3057e1c..d4ce8c4 100644
--- a/jupyter/MachineLearning_and_CPLEX.ipynb
+++ b/jupyter/MachineLearning_and_CPLEX.ipynb
@@ -8,23 +8,23 @@
"\n",
"In 2016, a retail bank sold several products (mortgage account, savings account, and pension account) to its customers.\n",
"It kept a record of all historical data, and this data is available for analysis and reuse.\n",
- "Following a merger in 2017, the bank has new customers and wants to start some marketing campaigns. \n",
+ "Following a merger in 2017, the bank has new customers and wants to launch some marketing campaigns. \n",
"\n",
"The budget for the campaigns is limited. The bank wants to contact a customer and propose only one product.\n",
"\n",
"\n",
"The marketing department needs to decide:\n",
" * Who should be contacted?\n",
- " * Which product should be proposed? Proposing too many products is counter productive, so only one product per customer contact.\n",
+ " * Which product should be proposed? (Proposing too many products is counter productive, so only one product will be proposed per customer contact.)\n",
" * How will a customer be contacted? There are different ways, with different costs and efficiency.\n",
- " * How can they optimally use the limited budget?\n",
+ " * How can they optimally use their limited budget?\n",
" * Will such campaigns be profitable?\n",
" \n",
"#### Predictive and prescriptive workflow\n",
"\n",
- "From the historical data, we can train a machine learning product-based classifier on customer profile (age, income, account level, ...) to predict whether a customer would subscribe to a mortgage, savings, or pension account.\n",
- "* We can apply this predictive model to the new customers data to predict for each new customer what they will buy.\n",
- "* On this new data, we decide which offers are proposed. Which product is offered to which customer through which channel:\n",
+ "From the historical data, you can train a machine learning product-based classifier on customer profile (age, income, account level, ...) to predict whether a customer would subscribe to a mortgage, savings, or pension account.\n",
+ "* You can apply this predictive model to the new customer data to predict for each new customer what they will buy.\n",
+ "* With this new data, you decide which offers are proposed. Which product is offered to which customer through which channel is determined:\n",
" * a. with a greedy algorithm that reproduces what a human being would do\n",
" * b. using an optimization model wih IBM Decision Optimization.\n",
"* The solutions can be displayed, compared, and analyzed.\n",
@@ -34,7 +34,7 @@
"\n",
"* [Understand the historical data](#Understanding-the-historical-data)\n",
"* [Predict the 2017 customer behavior](#Predict-the-2017-customer-behavior)\n",
- "* [Get business decisions on the 2017 data](#Get-business-decisions-on-the-2017-data)\n",
+ "* [Get business decisions for the 2017 data](#Get-business-decisions-on-the-2017-data)\n",
"* [Conclusion on the decision making](#Conclusion)"
]
},
@@ -54,10 +54,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "The purpose of this Notebook is not to provide a perfect machine learning model nor a perfect optimization model.\n",
- "The purpose is to show how easy it is to mix machine learning and CPLEX data transformations by doing a forecast, then getting fast and reliable decisions on this new data. \n",
+ "The purpose of this notebook is to show how easy it is to mix machine learning and CPLEX data transformations by doing a forecast, then getting fast and reliable decisions on this new data. \n",
"\n",
- "This notebook takes some time to run because multiple optimization models are solved and compared in the part dedicated to what-if analysis. The time it takes depends on your subscription type, which determines what optimization service configuration is used."
+ "This notebook can take some time to run because multiple optimization models are solved and compared in the part dedicated to what-if analysis. The time it takes depends on your subscription type, which determines what optimization service configuration is used."
]
},
{
@@ -394,16 +393,16 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "We have 1650 clients who bought several products\n",
- "We have 123 clients who bought all the products\n"
+ "You have 1650 clients who bought several products\n",
+ "You have 123 clients who bought all the products\n"
]
}
],
"source": [
"abc = known_behaviors[known_behaviors.nb_products > 1]\n",
- "print(\"We have %d clients who bought several products\" %len(abc))\n",
+ "print(\"You have %d clients who bought several products\" %len(abc))\n",
"abc = known_behaviors[known_behaviors.nb_products == 3]\n",
- "print(\"We have %d clients who bought all the products\" %len(abc))"
+ "print(\"You have %d clients who bought all the products\" %len(abc))"
]
},
{
@@ -421,14 +420,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "##### Do some visual analysis of the historical data"
+ "##### Provide some visual analysis of the historical data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "It's possible to use pandas plotting capabilities, but it would require a new version of it. This Notebook relies on matplotlib as it is present everywhere."
+ "It's possible to use pandas plotting capabilities, but that would require a new version of it. This notebook relies on matplotlib as it is commonly used."
]
},
{
@@ -489,9 +488,9 @@
"metadata": {},
"source": [
"### Understanding the 2016 customers\n",
- "We can see that:\n",
- " * The greater a customer's income, the more likely it is s/he will buy a savings account.\n",
- " * The older a customer is, the more likely it is s/he will buy a pension account.\n",
+ "You can see that:\n",
+ " * The greater a customer's income, the more likely it is he or she will buy a savings account.\n",
+ " * The older a customer is, the more likely it is he or she will buy a pension account.\n",
" * There is a correlation between the number of people in a customer's household, the number of loan accounts held by the customer, and the likelihood a customer buys a mortgage account. To see the correlation, look at the upper right and lower left corners of the mortgage chart."
]
},
@@ -536,7 +535,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Let's use the following columns as machine-learning features:"
+ "Use the following columns as machine-learning features:"
]
},
{
@@ -658,7 +657,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "We use a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C."
+ "You are using a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C."
]
},
{
@@ -694,8 +693,8 @@
"source": [
"### New customer data and predictions\n",
"\n",
- "Load new customer data, predict behaviors using trained classifier, and do some visual analysis.\n",
- "We have all the characteristics of the new customers, as for the 2016 clients, but the new customers did not buy any product yet.\n",
+ "Load new customer data, predict behaviors using a trained classifier, and perform some visual analysis.\n",
+ "You have all the characteristics of the new customers, as for the 2016 clients, but the new customers have not yet bought any product.\n",
"\n",
"##### Load new customer data"
]
@@ -921,7 +920,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "##### Do some visual analysis of the predicted data"
+ "##### Perform some visual analysis of the predicted data"
]
},
{
@@ -1002,8 +1001,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "We predicted that 112 clients would buy more than one product\n",
- "We predicted that 0 clients would buy all three products\n"
+ "It's predicted that 112 clients would buy more than one product\n",
+ "It's predicted that 0 clients would buy all three products\n"
]
}
],
@@ -1011,9 +1010,9 @@
"to_predict[\"nb_products\"] = to_predict.Mortgage + to_predict.Pension + to_predict.Savings\n",
"\n",
"abc = to_predict[to_predict.nb_products > 1]\n",
- "print(\"We predicted that %d clients would buy more than one product\" %len(abc))\n",
+ "print(\"It's predicted that %d clients would buy more than one product\" %len(abc))\n",
"abc = to_predict[to_predict.nb_products == 3]\n",
- "print(\"We predicted that %d clients would buy all three products\" %len(abc))"
+ "print(\"It's predicted that %d clients would buy all three products\" %len(abc))"
]
},
{
@@ -1021,9 +1020,9 @@
"metadata": {},
"source": [
"## Remarks on the prediction\n",
- "The goal is to contact the customers to sell them only one product, so we cannot select all of them.\n",
- "This increases the complexity of the problem: we need to determine the best contact channel, but also need to select which product will be sold to a given customer. \n",
- "It may be hard to compute this. In order to check, we will use two techniques:\n",
+ "The goal is to contact the customers to sell them only one product, so you cannot select all of them.\n",
+ "This increases the complexity of the problem: you need to determine the best contact channel, but also need to select which product will be sold to a given customer. \n",
+ "It may be hard to compute this. In order to check, you will use two techniques:\n",
" * a greedy algorithm\n",
" * CPLEX, the IBM leading optimization solver."
]
@@ -1043,12 +1042,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Get business decisions on the 2017 data\n",
+ "# Get business decisions for the 2017 data\n",
"## Assign campaigns to customers\n",
"\n",
- "* We have predicted who will buy what in the list of new customers.\n",
- "* However, we do not have the budget to contact all of them. We have various contact channels with different costs and effectiveness.\n",
- "* Furthermore, if we contact somebody, we don't want to frustrate them by proposing multiple products; we want to propose only one product per customer.\n",
+ "* You have a prediction of who will buy what in the list of new customers.\n",
+ "* However, you do not have the budget to contact all of them. You have various contact channels with different costs and effectiveness.\n",
+ "* Furthermore, if you contact a customer, you want to propose only one product per customer.\n",
"\n",
"##### Some input data for optimization\n"
]
@@ -1083,7 +1082,7 @@
"metadata": {},
"source": [
"#### Using a greedy algorithm\n",
- "* We create a custom algorithm that ensures 10% of offers are made per channel by choosing the most promising per channel. The algorithm then continues to add offers until the budget is reached."
+ "* You are creating a custom algorithm that ensures 10% of offers are made per channel by choosing the most promising per channel. The algorithm then continues to add offers until the budget is reached."
]
},
{
@@ -1251,7 +1250,7 @@
"source": [
"#### Using IBM Decision Optimization CPLEX Modeling for Python\n",
"\n",
- "Let's create the optimization model to select the best ways to contact customers and stay within the limited budget."
+ "Create the optimization model to select the best ways to contact customers and stay within the limited budget."
]
},
{
@@ -1402,7 +1401,7 @@
"source": [
"##### Express the objective\n",
"\n",
- "We want to maximize expected revenue, so we take into account the predicted behavior of each customer for each product."
+ "You want to maximize expected revenue, so you take into account the predicted behavior of each customer for each product."
]
},
{
@@ -1580,7 +1579,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "With the mathematical optimization, we made a better selection of customers."
+ "With the mathematical optimization, you made a better selection of customers."
]
},
{
@@ -1871,7 +1870,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Due to the business constraints, we can address a maximum of 1680 customers with a \\$35615 budget.\n",
+ "Due to the business constraints, you can address a maximum of 1680 customers with a \\$35615 budget.\n",
"Any funds available above that amount won't be spent.\n",
"The expected revenue is \\$87.1K."
]
@@ -1881,8 +1880,8 @@
"metadata": {},
"source": [
"### Dealing with infeasibility\n",
- "What about a context where we are in tight financial conditions, and our budget is very low?\n",
- "We need to determine the minimum amount of budget needed to adress 1/20 of our customers."
+ "What about the context where you have tight financial conditions, and our budget is very low?\n",
+ "You need to determine the minimum amount of budget needed to address 1/20 of our customers."
]
},
{
@@ -1933,7 +1932,7 @@
" #setting all bool vars to 0 is an easy relaxation, so let's refuse it and force to offer something to 1/3 of the clients\n",
" mdl.add_constraint(totaloffers >= len(offers)//20, ctname=\"high\")\n",
" \n",
- " # solve has failed, we try relaxation, based on constraint names\n",
+ " # solve has failed, trying relaxation, based on constraint names\n",
" # constraints are prioritized according to their names\n",
" # if a name contains \"low\", it has priority LOW\n",
" # if a ct name contains \"medium\" it has priority MEDIUM\n",
@@ -1987,8 +1986,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "We need a minimum of 15950\\$ to be able to start a marketing campaign.\n",
- "With this minimal budget, we will be able to adress 825 possible clients."
+ "You need a minimum of 15950\\$ to be able to start a marketing campaign.\n",
+ "With this minimal budget, you will be able to address 825 possible clients."
]
},
{
@@ -2005,17 +2004,17 @@
"| Greedy | 50800 | 1123 | 299 | 111 | 713 | 21700 |\n",
"| CPLEX | 72600 | 1218 | 381 | 117 | 691 | 25000 |\n",
"\n",
- "* As you can see, with Decision Optimization, we can safely do this marketing campaign to contact 1218 customers out of the 2756 customers. \n",
- "* This will lead to a \\$91.5K revenue, significantly greater than the \\$49.5K revenue given by a greedy algorithm.\n",
- "* With a greedy algorithm, we will:\n",
+ "* As you can see, with Decision Optimization, you can safely use this marketing campaign to contact 1218 customers out of the 2756 customers. \n",
+ "* This will lead to a \\$72.6K revenue, significantly greater than the \\$50.8K revenue given by a greedy algorithm.\n",
+ "* With a greedy algorithm, you will:\n",
" * be unable to focus on the correct customers (it will select fewer of them), \n",
" * spend less of the available budget for a smaller revenue.\n",
" * focus on selling savings accounts that have the biggest revenue\n",
"\n",
"### Marketing campaign analysis\n",
- "* We need a minimum of \\$16K to be able to start a valid campaign and we expect it will generate \\$47.5K.\n",
+ "* You need a minimum of \\$16K to be able to start a valid campaign and you expect it will generate \\$47.5K.\n",
"\n",
- "* Due to the business constraints, we will be able to address 1680 customers maximum using a budget of \\$36K. Any money above that amount won't be spent. The expected revenue is \\$87K.\n"
+ "* Due to the business constraints, you will be able to address 1680 customers maximum using a budget of \\$36K. Any money above that amount won't be spent. The expected revenue is \\$87K.\n"
]
},
{