Previously we learned how to collect data and visualise it in fun ways to encourage crowd-sourced collection of data, and also assemble data from different sources to get a unique view business performance.
Great! Let’s assume that our business now understands the value of collecting data from different sources and have the data visualised effectively. What is the next step?
What can we do to accelerate the transition to a data-driven business culture?
One of the most impactful things we can do is to start using machine learning models to make decisions for us, based on the data.
The beauty of machine learning models is that the accuracy(making the right prediction) tends to increase when collecting more data.
And if a machine learning algorithm gets a prediction wrong, it will learn from its mistakes and update its predictions for the future.
Since ANCHOVY cannot use sensitive client data, we will use a public dataset of a telecommunications company to predict which customers are likely to voluntarily churn, which means that they stop using our service and most likely switch to a competitor. By successfully predicting customers at risk of churning, we can then come up with ways to solve the problem and keep the customers for longer and foster a loyal relationship, which otherwise would not have happened.
In the following hypothetical example, we will only explore whether a new customer is likely to churn or not. However, it’s also possible to make predictions on how many months will it take for the new customer to churn.
The Hypothetical Situation
You are the CEO of a leading telecom company in Malta, and recently, your company has been losing a lot of customers, who are switching to the other major telecommunication provider.
We have the following data about 7043 customers:
- Services that each customer has signed up for
- Customer account information
- Demographic info about customers
We will use this data to find out the patterns of customers who we know have churned in the last month.
EXPLORATORY DATA ANALYSIS
Part 1 : Demographic Analysis
The first step in identify customer behaviour patterns, is understanding who the customers are in the first place.
Gender of customers is split very close to 50-50 %, regardless of whether they have churned or not. Therefore, Gender does not help us at all in predicting whether a customer will churn or not.
There are a lot more young customers using our services than senior customers. Only 20% of our customers are considered senior citizens.
Almost 50% of the senior customers ended up churning in the last month, while only 25% of young customers have churned.
Therefore, seniority is a potential factor that can make a difference in our churn prediction.
Customers who didn’t have a partner were more likely to churn.
Part 2 : Service Information
Customers who churned were disproportionately likely to have a Fiber-optic internet service. This could be hinting at internet service issues being a reason for churning for these customers.
It would also be good to know the geolocation data, to check whether particular neighbourhoods are being affected by bad internet service, and see if those neighbourhoods are the ones responsible for churn. Unfortunately, geolocation data is not available in this dataset.
This is the churn and non-churn percentages for the rest of the services.
Part 3 : Contract & Payment
87% of the customers who churned were on Month-to-month plans. This makes sense since it’s easier for these types of customers to churn, as they are not bound by long-term contracts.
Customers who churned tended to use electronic cheques. (57%)
Customers who churned tended to be customers who are paying 70-100 dollars per month.
Customers with low tenure (who haven’t been customers for long) were disproportionately likely to churn. Unsurprisingly, customers who have been customers for a long time, were unlikely to have churned in the last month.
KEY INSIGHTS FROM ANALYSIS:
- Gender does not affect whether a customer will churn or not
- 50% of senior customers have churned in the last month
- 64% of customers who churned had a partner
- 69% of customers who churned were on the fibre optic internet package
- 57% of the users paying via electronic cheque have churned in the last month
- 88% of the users under a month-to-month contract have churned in the last month
- 42% of the users with fiber optic internet have churned in the last month
- 42% of the senior citizen users have churned in the last month
- Churned data points are concentrated in high monthly charges(70-100 dollars/month) and low tenure.
- Likewise, non-churned data points are concentrated in low monthly charges & low tenure and high monthly charges & high tenure.
Turning at-risk customers into loyal customers
Now what do we do to prevent the customers from actually churning?
Now we know that new customers just signing up for a 70-100 dollar per month customer under a month-to-month contract is already at high risk of churning (regardless of the other factors).
Clearly, we need to re-think our strategy and value proposition to this segment of customers, since we know that what we are doing now is not working.
We could try different things to try and retain customers at risk:
- Referral system: We might at least get at-risk customers to refer new customers before churning.
- Cross-promotion trial: If they are not subscribed to all the service, we can give them limited-time access to a service they are currently not paying for.
- Progression system: To keep customers engaged and unlock special offers and bonuses (customers might get achievements for watching 60 minutes of television service etc)
Making Churn Predictions
Now that we have a better idea of the situation, and what we might want to do about it, it’s time to put it into production. We want to deploy an algorithm that can make high-accuracy predictions in real-time, so that we can identify all new customers if they are at risk of churning or not, and then take the precautionary steps chosen.
In this example, we will briefly try out 5 different machine learning algorithms, with simple explanations so as not to overload with information. We will crown a winning algorithm, based on which algorithm gets the highest accuracy (this is not always the right choice, but we want to keep things simple).
TESTING OF AI ALGORITHMS (TECHNICAL)
Nearest Neighbour Algorithm:
A nearest neighbour algorithm tries to predict an outcome based on data with similar characteristics . The nearest neighbour is calculated by measuring the distance between points on a graph.
So in the example above, if we had to predict new customers(represented as green dots), we would base our prediction on what the surrounding dots are. Also, if
we base our prediction on the 3 nearest dots (3 nearest neighbours), we might get a different result than if we based it on the 5 nearest dots(5 nearest neighbours). This situation is illustrated by the middle green dot, which would be classified differently depending on how many nearest neighbours we want to use to form our prediction.
(Nearest neighbours always need to be an odd-number so that there is always a tie-breaker)
Performance Evaluation (Nearest Neighbours)
Our Nearest Neighbour algorithm got 865 cases right when predicting that customers were not going to churn, and they actually didn’t churn (top left quadrant ).
It got 186 cases correct when it predicted that these customers will churn, and they did (bottom right quadrant)
The other (white) quadrants are the number of cases the algorithm got wrong.
We can calculate the accuracy of the algorithm by comparing how many cases it got right out of all cases.
This gives an accuracy of 74.7%
We can check if we can use a better amount of neighbours. As can be seen in the graph below, 11NN is the best tradeoff between accuracy(score) and time, since more nearest neighbours requires more calculation cycles to compute.
Now let’s quickly test out the 4 other algorithms and see if they can help us better predict whether a customer will churn or not.
Logistic Regression algorithm
A logistic regression is an algorithm that classifies the
assigns a probability between 0 and 100% of an outcome happening.
Using Logistic Regression , the accuracy of our customer churn prediction was : 79.9%
Naive Bayes algorithm
A naive bayes algorithm uses Bayesian methods of calculating conditional probabilities, in this case whether a customer will churn or not.
Naive Bayes accuracy was : 72.1%
Decision Trees (Random Forests)
Decision trees are an algorithm that creates a series of if-this-then-that rules based on the data. Decision trees create a lot of Yes/No splits based on the data in order to come up with a final output of whether the customer will churn or not.
A random forest is simply an algorithm that runs multiple decision trees in parallel, and averages the results out, which usually gives higher accuracy but takes more time to compute.
Random Forest accuracy with 5 Decision Trees was 77%
Support Vector Machine(SVM)
Support Vector Machine is a black-box algorithm.Therefore, we know what data we are feeding the algorithm (input), and we know what the SVM outputs, but we don’t know exactly what goes on in an SVM algorithm.
SVM accuracy was : 79.5%
AI Solution Evaluation
Logistic Regression was the best performing algorithm in terms of accuracy. With this algorithm, we can correctly predict which customers will churn and which won’t in 80% of the cases.
However, is this really good enough?
In most businesses and industries, it is more expensive to gain new customers than retaining current customers.
Thus, the most important goal of the algorithm should be to minimise errors where the algorithm predicts that a customer won’t churn but in fact, they do.
This is because the business needs to do everything possible, within economic feasibility, to retain the customers. If the chosen algorithm doesn’t do a good job at correctly predicting which customers will churn, then it’s useless. Even if it does a good job at predicting which customers WON’T churn, that is relatively useless information from a business perspective.