Last week, I asked my email newsletter readers to submit “one data analysis problem you are struggling with”. We called it BYOD – Bring your own data. More than 100 people have emailed various interesting (and often very difficult) problems. This week (between 16th of February to 20th of February), let’s take a look at some of these problems and solve them.
This problem is sent by Robot Mak.
Who is the most consistent of all?
Imagine you are a category manager at a large e-commerce company. Your site offers various products, but you don’t really make these products. You list products made by other vendors on your site. Every day, these vendors would send you invoices for the amount of product they have sold. Here is a snapshot of such invoices.
Looking at this list, you have a few questions.
- Who is the best seller?
- Who is the most active seller?
- Who is the most consistent seller?
- Which seller has fewest invoices?
Let’s go ahead and answer these using Excel. Shall we?
But first, what is consistency?
Consistency is the kind of word that means different things to different people. So when we analyze a set of data to find most consistent item, the first thing we need is a consistent definition of consistency.
Let’s look it up in the dictionary.
con·sis·ten·cy: Reliability or uniformity of successive results or events
The keyword is uniformity. So in our case, we can say a seller is consistent, if they are sending invoices fairly regularly.
We have 11 days of data in the data-set.
We can calculate consistency % for a seller using this:
=number of days on which invoice is sent / 11
So who would be most consistent?
Let’s say 2 sellers have sent invoices on each of those 11 days. Then who is most consistent? Both of them have consistency = 100%.
May be we can calculate weighted consistency %?
Since for this e-commerce business, the most important factor is revenues (well, it should be profits, but we don’t have that data here), we can calculate weighted consistency by adding a fraction that depends on the value of total invoice amount.
Something like this:
=consistency % + (0.000001)*rank of seller based on invoice amount in ascending order
note: if your data has more than 1000 sellers, multiply with a smaller number like 0.00000001
Alternative ways to figure out weights:
Instead of revenue rank, we can use below alternatives too:
- Standard deviation of invoice amounts per day
- Standard deviation of number of invoices per day
- Rank of total invoice count in ascending order
Option 1. Using Pivot Tables
The easiest way to answer all the questions is to use Pivot Tables. Just follow below steps:
- Insert a pivot table from the invoice data.
- Then add “Date” to row label area
- Add “Seller” to column label area
- Add “value” to values area
- Just looking at the below report, we can answer questions 1 & 4
- To answer question 3 (most consistent seller), we have to see which sellers have invoices against maximum number of dates. Both SELLER2 & SELLER6 qualify. Since SELLER6 has higher amount, we can say she is most consistent (based on our definition of most consistent above)
- To answer question 2 (most active seller), replace “value” with “reference” in pivot table and find out the seller with maximum count
Option 2. Using formulas
While the pivot table approach works, it is ad-hoc. That means, we can’t extract the names of sellers automatically. We can use Excel formulas to answer all these questions elegantly.
Let’s assume all the data is in a table named sales
In your workbook, calculate all of these:
Now we just need a few doses of INDEX+MATCH formulas to answer the questions.
Who is the best seller?
Note: Column H has the seller names & I has the seller amounts
Who is the most active seller?
Note: Column J has invoice count
Who is the most consistent seller?
Note: Column M has weighted consistency %
Who is the seller with fewest invoices?
Download Example Workbook
Click here to download example workbook with all these calculations. Examine the formulas & pivot table to learn more.
How do you measure consistency?
I will be honest. This is the first time I calculated consistency. But I find it interesting. Consistency can be used to understand your data better & make informed decisions. Few common situations where it can really help,
- Identifying consistent customers to reward them
- Finding consistent assembly line in a set of them
- Optimizing re-ordering pattern of inventories based on how consistently orders are placed
What about you? Do you measure consistency of your data? What techniques do you use? Please share your techniques & tips in the comments section.