Imagine running a company selling its product to 50.000 groceries and 50.000 gas stations. How would you allocate your marketing budget? The truth is, each of these specific customers might exhibit unique characteristics. Therefore, for instance, equally treating the entire gas stations would most probably cause over or underspending.
Customer classes such as gas station, market, deli, etc. are not adequate to capture essential customer patterns. So, how can we extract behavioral information from the customer data to leverage marketing campaigns?
RFM, which stands for ‘Recency’, ‘Frequency’ and ‘Monetary Value’ covers 3 significant dimensions to capture customer characteristics.
Recency: How recent was the customer’s latest transaction?
Frequency: How many times did the customer purchase a product/service throughout the year?
Monetary Value: How much money did the customer spend during the year?
The dataset I will work on is an artificial full-year customer invoice data of a CPG company.
Without further ado, let’s roll our sleeves and dive into Python code to extract our dimensions one by one!
I kick off by getting the frequency feature along with its distribution plot:
Based on the distribution plot, we observe that the majority of the transaction frequency gathers around ~10. This finding indicates that customers make ten transactions on average during an entire year.
Now, we populate our second dimension:
On average, customers spend approximately ~2000 a year. One can easily notice some outliers with extreme spending amounts.
And, our final dimension:
What would you say about customers with recency values over 100 days or even 200 days? Did we already lose them? Well, it depends on the industry. For the CPG industry, passive customers with more than 100 days are generally considered ‘churn’, in other words, ‘lost’.
Okay, now that we combined these features into one dataframe, let’s visualize it! (code link)
This is nothing but a huge mess of customer pile which begs the question:
How are we going to segment them?
Most data scientists try to answer this question by implementing K-Means clustering, which is basically a machine learning algorithm used for seperating data into clusters based on data points’ distance to each other: (https://stanford.edu/~cpiech/cs221/handouts/kmeans.html)
However, I am an advocate of a manual approach, since ML algorithms do not consider any industry-specific dynamics. Accordingly, I will follow two essential steps to ace this challenge:
1- Quantile Transformation and Ranking
In simple terms, a quantile is where a sample is divided into equal-sized subgroups. Quartiles are also quantiles; they divide the distribution into four equal parts. In this case, I seperated RFM values into 4 quartiles and simply labeled them based on their values. (very bad-bad-good-very good). This is what the dataframe transformed into:
2- Segmentation by RFM rankings
This is the part, where the business understanding comes into play. I wrote a python function with a bunch of ‘if-else statements’ to define segments based on their RFM rankings. For instance, I extracted a customer segment with ‘very good’ levels of F-M, yet ‘very bad’ level of R. This means that the customer was once very active and valuable but it’s been a long time since the latest transaction. Hence, the company needs to win this group back urgently! The below-stated function is my personal approach and one can finetune these classes based on different market dynamics.
And here we go! We successfully created our customer segments to efficiently design upcoming marketing strategies! An elegant treemap comes in handy in terms of reflecting the big picture: (code link)
Below, is an analysis of the recently created customer segments through R-F-M. Notice how each group has its own behavioral pattern and differentiates meaningfully.
In a nutshell, RFM is a powerful analytics point of view in separating customers via their behavioral characteristics. And needless to say, leveraging business through analytics necessitates not only solid data science skills but also holistic business understanding.
Thank you very much for reading this article, and I would love to hear your ideas in the below comment section!
You can reach out to entire code below: