When we develop statistical models for classification tasks (e.g. using machine learning), we usually need to have a way to compare the generated models to decide which model is best. Typical tools for this task are Gains, ROC or Lift charts. All of them are popular, so there arises a question what is the difference between them. In addition, we observe there is a general lack of deep knowledge of these important and widely used tools.
This article sets out to describes these charts in a detailed manner by detailing their use in a illustrative scenario. If you want to learn more about these charts and how to use them, this article is aimed at you.
Typically called a Cumulative Gains Chart it can be simply explained by following example:
For simplicity let's assume we have 1000 customers. If we run advertising campaign to all our customers, we might find that 30% (300 out of 1000) will respond and buy our new product.
Marketing to all our customers could be one strategy of running a campaign. But this is not the optimum use of our marketing dollars, especially for large customer bases. Therefore we would like to have a better way of running this advertising campaign, so that instead of targeting all our customer base, we target only to those customers with a high probability of responding positively to the campaign. This will, firstly lower the cost of the campaign and, secondly (and maybe more importantly) we will not disturb those customers with advertising who have no interest in our new product.
This is where predictive classification models come in. There are lots of different models, but no matter which one we use, we can still evaluate the results of our model by using Cumulative Gains Charts. If we have historical data with the reactions of customers to past campaigns, we can use the data to build a model that predicts, if a particular customer will respond by buying the product or not. The results of such a model are typically, for each customer, the probability of a positive and negative reaction from the customer. We can sort customers according to the probability of a positive reaction to the campaign and run the campaign only for a percentage of customers with highest probability.
The Gains chart is the visualization of that principle. On the X axis we have the percentage of the customer base we want to target with the campaign. The Y axis gives us the answer to what is the percentage of all positive response customers have been found in the targeted sample. In the picture below you can see an example of the Gains chart. (The gains chart associated with the model is the red curve):
What can we read from the graph? What happens if we only target 10% of our customer base? According to the results of our model if we will take the 10% of customers with the highest probability of a positive response, we will get 28% of all the possible positive responses. Which means we will find 84 customers with positive responses from the 100 customers reached by the campaign (84 is 28% of 300 positive response customers in our customer base).
With an increase of targeted customers to 50%, we already have more than 80% of those which will, in real situation, give a positive response. If this is our selected strategy for the real campaign (reaching 50% of our customers by the model), then we will have reached 80% of all the positive responses and saved 50% of our costs of running the campaign (we do not want to run the campaign to customers that are not likely to respond positively).
The choice of the percentage to be targeted in the campaign depends on the concrete costs for the campaign and profit from the expected positive responses. The Gains chart is a display of the expected results base on the choice of the percentage targeted. Our final strategy therefore consists of the model and the targeted percentage (instead of the percentage we can define the cut-off value for probabilities - if the probability is above this value/threshold we will include customer in the campaign).
It was already said that the red curve represents the proposed model. The Blue curve represents the gains chart of a random model. In this case, we are displaying the observed results of picking customers randomly without any selection criteria, which assumes that we would get the same proportion of positive responses if we target the whole customer base. In other words If we target 10% of all customers, we will have 10% of all the positive responses within our 10% sample. The curves are meeting at (0, 0) and (100, 100), the second point means we run the campaign to all customers, therefore the output (all those who responded positively) is the same as the observed results. When we are using a predictive model, in this case picking customers according to sorted probabilities, it does not make sense when we include all customers.
The Green curve is the optimal model, the best possible order for picking customers – we will first target all customers with a positive response and then those with a negative response. The slope of the first part of the green curve is 100/(percentage of all positive responses).
To test our strategy (defined by the model and the targeted percentage or equivalently the cut-off value) we need to compare the output of the model to the actual results in the real world. This is done by comparing the results and creating a contingency table of misclassification errors (terminology as used in hypothesis testing - TP means true positive, FN false negative, FP false positive and TN true negative):
Count TP (right decision)
Count FN (error of second kind)
Count FP (error of first kind)
Count TN (right decision)
Ideally we want to have the right decisions being made with high frequency. Such a table (usually called a confusion matrix) is a very important decisioning tool when we evaluate the quality of the model.
For better orientation, it is common practice to display the confusion matrix in the form of the following graph. From this graph we clearly see, how many times the model predicts correctly (true negatives and true positives) and how many times we have an incorrect prediction (false positives and false negatives). The better the model, the larger the bars TP and TN in comparison to FN, FP.
A point on the gains chart is equivalent to:
The second term is on the X axis and it is fraction of targeted customers.
Discussed curves (ROC, Gains and Lift) are computed based on information from confusion matrices. It is important to realise that curves are created according to a larger number of these confusion matrices for various targeted percentages/cut-off values.
Other terms connected with a confusion matrix are Sensitivity and Specificity. They are computed in the following way:
The ROC curve (Receiver Operating Characteristics curve) is the display of sensitivity and specificity for different cut-off values for probability (If the probability of positive response is above the cut-off, we predict a positive outcome, if not we are predicting a negative one). Each cut-off value defines one point on ROC curve, ranging cut-off from 0 to 1 will draw the whole ROC curve. The Red curve on ROC curve diagram below is the same model as the example for the Gains chart:
The Y axis measures the rate (as a percentage) of correctly predicted customers with a positive response. The X axis measures the rate of incorrectly predicted customers with a negative response.
The optimal model could be the following: Sensitivity will rise to a maximum and specificity will stay the whole time at 1 (the optimal model is in green color). The task is to have ROC curve of the developed model as close as possible to optimal model.
The Gains and the ROC curve are visualizations showing overall performance of the models. The shape of the curves will tell us a lot about the behavior of the model. It clearly shows how much our model is better than a model assigning categories randomly and how far we are from the optimal model which is in practice unachievable. These curves can help in setting the final cut-off point for deciding which probabilities will mean positive and negative response prediction. The model together with the cut-off point will define our strategy of who should be targeted by the campaign and who should not be (Typically a chosen default value 0.5 might not meet the requirements of the use case nor would it be the best cut-off). During the building of the predictive model, we can have many interim models - candidates for final best model. Displaying more Gains (ROC) charts for more models in one graph gives the possibility to compare models.
It is very important to mention that ROC, Gains or Lift charts are connected only by one predicted category! In our example we were interested in finding customers with positive responses because that was the main task of our use case. There are also analogical Gains and ROC charts that represent the negative customer response as well. If the main goal for prediction was finding the customers with negative response, the criterion for quality of the model would be rather Gains or ROC curve for negative response category.
So, what is the difference?
Both curves are displaying the dependence of the correctly predicted category in question (positive response in our example) with changing cut-off of assignment to that category. The difference is the scale on the X axis of the graph, whereas the Y axis is same for Gains as well as ROC chart. If you love formulas then have a look at following table:
The Graphical representation of the results as a confusion matrix is below - colors on the graph represent the same as the color markings in the table above:
The whole principle of connection of Gains and ROC charts together with Confusion matrices (tables of good and bad classifications) is below. The main goal of the graphs below is to highlight the fact that a single confusion matrix (as well as other measures like misclassification rate) are connected only with one point on Gains, ROC or Lift chart!
We have mentioned the Lift chart a number of times but not explained it. A Lift chart come directly from a Gains chart, where the X axis is the same, but the Y axis is the ratio of the Gains value of the model and the Gains value of a model choosing customers randomly (red and blue curve in above Gains chart). In other words it shows how many times the model is better than the random choice of cases. We can see that the value of the lift chart at X=100 is 1 because if we choose all customers there would be no lift. The same customers will be picked by both models.
We hope you enjoyed this article and we wish you lot of good predictive models.