How would you analyze data when you have lots of it? That is the inspiration for this series.
This is the first part of 3 part series on this theme. Please use below links to access other parts.
- What do they complain about – Part 1
- Regional trends – Part 2
- My bank vs. your bank in a scorecard – Part 3
Let’s meet our data – Finance Industry Consumer Complaints
As part of open data initiatives, US government & Consumer Financial Protection Bureau maintain a list of all consumer complaints made against financial institutions (banks, credit unions etc.) You can download this data from the catalog page here. I have obtained the data on 1st of February, 2016. The download has 513,824 records. Each row contains one complaint.
The data has these columns.
Analyzing with a blank slate – how to start?
Here is how we can find things to analyze / discover / visualize / present in similar situations.
Metrics & KPIs
All industries have metrics / KPIs. Can you identify the metrics that matter in a given situation? For example, if you are the CEO of Citi Bank and you want to understand your bank’s performance in the customer service area, you could define below metrics:
- Complaints per million transactions: A high rate indicates poor customer experience.
- Average response time: Low response times indicate smooth & well defined processes and empowered support staff.
- % of disputed resolutions: A high percentage indicates customers are not happy with the solution offered by the bank.
Top x items
Knowing which products / regions / customer groups are making most complaints (or most anything) can help you identify underlying issues and address them. Let’s say you found out that identity theft is a big issue, you could launch a campaign to educate customers about safe banking, offer additional security measures like 2 step verification etc.
You could look at the trend of complaints (or any other metrics) over time to understand improvement in the performance. Let’s say you want to understand how your safe banking campaign impacted number of complaints made in the identity theft category. A trend analysis will provide the information you want.
This is a variation of top x items. If you are short on resources and want to know which items are easy pickings to improve your customer service. A Pareto analysis on the issues raised by customers can tell you just that.
Knowing how things are spread on an axis can help you optimize your processes. For example, if you know that every Wednesday we get a lot of complaints, we can plan our contact center staffing in a such a way to reduce the average waiting time.
A blank canvas (in this case, raw data) offers many possibilities. As an analyst, our job is to dig out the information that helps our company succeed.
Ways to analyze this complaint data
As there are many options, I just picked three to showcase what we can do. Feel free to download the data and do your own analysis.
- What do they complain about – Part 1
- Regional trends – Part 2
- My bank vs. your bank in a scorecard – Part 3
What do they complain about?
There are 11 types of products & 48 sub-products spread across 3,496 companies. There are 95 types of issues faced by customers, further categorized in to 68 sub issues. And there is complaint narrative (only available on 50,902 complaints or roughly 10%).
Making sense of all this text can be hard and complex.
This is where pruning comes in to picture. If we can understand just top 25 issues and give user the power to drill down to any product / state / time frame, then users can discover issues that matter most.
Analyzing top 25 issues
The first step is simple. We just make a pivot table from the data (make sure you add the pivot table to data model to capitalize on new Excel features) and drop the issue in rows area and complaint ID in to values area.
We get this:
Now, add below filters
- Timeline on the Date Received
- Slicer on Product
- Slicer on State
Sort the pivot table by descending order of Complaint Count.
Although the sorted pivot table (first 25 rows) provides the answer we want, it is a tough nut to chew. So let’s jazz it up.
Visualizing top issues – Treemap
For the visualization, let’s use Tree map chart introduced in Excel 2016. If you are using Excel 2013 or earlier, don’t worry. We will learn another way to visualize the data further down this page.
If you select the pivot data and try to insert a treemap chart, you get a message booing you.
No worries, we will trick Treemap to work with pivot data. Just copy the first 25 rows of the pivot and paste them as link in a blank sheet.
Now that the data is in a regular Excel range, we can create a Treemap from it. Before we do that, let’s just add 26th row to the data, with the issue label [Other], which is the total of any issues outside top 25.
Select all this data and insert a treemap from Insert ribbon.
Polishing the visualization
Let’s move the chart, slicers & timeline to a separate sheet. Just cut and paste them.
Polish the slicers using styles and customization features (disabling the header, adjusting number of columns etc.) For more on slicer customization, read Introduction to slicers.
Adjust timeline granularity to Quarters.
Our polished slicers
Set up descriptive chart title & captions by harvesting the data from slicers, timeline & pivot. You may refer to download workbook for the actual formulas. Note, these work only in Excel 2013 or above.
The final visualization looks like this, click on it to enlarge.
Demo of the visualization
Check out the video demo of this interactive visualization & analysis.
Alternative Visualization for Excel 2010
As Treemap is not available in Excel 2013 or earlier, we can use other visualizations like bar charts or conditional formatting heatmaps to visualize this data. Here is an attempt, using bar charts. Click to enlarge the image.
Download Example Workbook
Please click here to download the consumer complaints analysis – part 1 workbook. Play with the visualizations to learn more.
Note: this file is 100 + mb. So give it a few minutes to download.
Next part – Regional trends & analysis
The 2nd part of this case study is now ready. Please click here to read regional trends & analysis of customer complaints data.
How would you analyze the complaint data?
Go ahead and play with this data yourself. How would you analyze it? Please share your ideas, analysis & charts in the comments section. If you wish to publish a chart, email it to firstname.lastname@example.org with the subject “Half a million complaints”. I will add your chart at the end of this post.
Learn more about data analysis & reporting
I cover this topic and 50 other case studies in my online class 50 ways to analyze data. If you are keen to learn advanced analytics and data science thru Excel, I highly recommend 50 ways course. Check it out here.
For something less intense, but still as much fun and detailed, check out Excel School program. This course teaches intermediate to advanced Excel with specific focus on Dashboard Reporting. Check out Excel School here.
16 Responses to “Analyzing half a million consumer complaints [Part 1 of 3]”
The only disappointments foreseen that keep me from enrolling in these courses (at my own personal expense, of course, no way my employer would fund it) are: 1) how long the course material is accessible, considering the price (it IS an online course, after all, so overhead can't be that high) 2) especially considering the price, it's a shame to see how much time I'll spend covering subjects I'm already familiar with, and 3) a lot of the stuff I've been seeing requires Excel later than 2007, which I currently have (sorry, but I got tired of repeatedly forking out hundreds of dollars at every new version, usually to find myself stuck having to relearn the new steps for old functions (the ribbons took a while, and finding macros was tedious, then the custom toolbar Ï had in 2003 for those functions I use all the time was suddenly impossible)
Hi John. Like you, my (now former) employer would not have paid for these courses, and I was stuck at the 2007 version for similar reasons. One solution for upgrading Excel at minimal up-front expense is to get a Microsoft 365 Pro Plus subscription. It's about $12 a month, and you can get the latest updates from now until you decide to stop subscribing.
Custom toolbars are possible in later versions of Excel, too...this may be worth your while.
I greatly feared becoming obsolete, and I wanted to learn Power Pivot and Power Query, neither of which exist in 2007. Subscribing (yes, at my own expense) was the best choice. Chandoo's courses have also helped me in my quest to stay up-to-date. Yes, his courses are pricey, but (1) they are great courses, and (2) this is how Chandoo puts food on the table, and I don't blame him for not giving them away. He puts lots of effort into the content, and if I was him I would charge top dollar too.
Hi John. Thanks for your comments. We offer full downloads of all lesson videos in our programs. These do not expire. It is something very few online training providers do, but I have made a conscious choice to give the downloadable videos so that users can refer to the content anytime. The online access is valid for an year but you can extend for 1 more by paying a fraction of the fee (usually $50 per 6 months and $100 per year). Although our cost for giving access is low, our costs on admin & tech support are high. As long as you remain a student in the classroom, you can post questions and doubts which we answer.
Please note that there is no compulsion to join any of our courses. You can read all the free content, download example files, listen to podcasts and watch free videos I post on YouTube without even paying me a penny. The courses enable me to I support my family and work on my mission of making people awesome.
All the best.
sorry to say this but you are too negative in your comments and focus too much on obstacles (and excuses).
Decide what you want and find ways to get it.
Focus your energy on overcoming obstacles rather than complaining.
Enjoy all the free stuff available online.
When you can afford it buy content that is relevant for you.
FYI. The pivot table is not working if the file is opened with Excel 2010... 🙁
Here's the error message for your information:
This workbook contains an Excel data model that is created in a newer version of Excel. You can open this workbook in an older version of Excel, but you will be unable to load or work with PowerPivot when it coexists with a Excel data model.
Thanks for this. Gave me a good working example of the cube functions with slicers.
When I add the data to the data model to use the cube functions to identify slicer selections, the number formatting doesn't carry over to the slicer from the data table. Is that what you have found? Any ideas on why that is? If it's not part of the data model, whatever number format I set in my table shows up in the slicer buttons, no problem.
Your other readers might be interested to know that I too had an issue with being stuck in the dark ages of Excel 2007 (both at home and at work) until I discovered Office 2013 on eBay. There are plenty of reputable sellers offering valid licence keys taken from scrapped PCs which work well with downloaded versions of the software from Microsoft. At the end of last year I paid £66 (about $100) for 2 licence keys which both activated perfectly. Seems to me a much better deal than $12 per month, indefinitely, for Office 365.
How to determine the reputable sellers offering valid licensing keys on eBay? Would like to do this too. I cant follow along with Excel 2010 so I need to upgrade....
I've got Office Standard 2016 and when I opened it in Excel 2016 I still got the broken chart message. Not sure if it is a a 2016 update issue, as that is controlled by the Admins and it's only pushed out typically monthly if not a critical issue. I had been copying the steps so I had a working chart with some variances but was able to figure out my problems from looking at the download. Liked this part and moving on to 2 and 3 as my work allows me to work at 'em.
Hmm.. I use standalone installation of Office 2016, but I am under the impression that O365 2016 versions are even more up to date (with newer features added frequently), so may be it was something else that caused the error.
I'm new to excel Pivot/BI. I have created the pivot table and slicers. what is meant
Timeline on the Date Received -
Thank you for your help and GREAT web site ,,,
Hi all. As an beginner/intermediate excel user I have been dipping in and out of Chandoo excel lessons and example downloads for the last year or so and have found both the lessons and examples invaluable. I cannot find any other site or group that provide such info and free examples. If your stuck and you need to get the right answer, it's the first place to go.
The negative comments here are sad but you should not be discouraged.I find your work fantastic and enriching. You do not have to apologize for any charges, it is up to everyone to decide to take it or not. Good work must be rewarded somehow and we all must make our living
Top achievements, please procceed. I am delighted every time about the great ideas and solutions, you have helped me very much.
This has been awesome to me as I am new in excel.
Unfortunately, Excel does not handle massive data particularly well. I do extensive analysis on data like this daily. It is comprised of 20+ columns or fields, and upwards of 40,000 rows (complaints). There are over 100 different parts or possible failures and I add costs and dates and locations etc. I also calculate PPM (parts per million) so I pull a population under warranty which could be 270,000+ lines. As part of the analysis, I use a number of pivots, slicers, vlookups, links to other spreadsheets, and possibly several nested if/and formulas. It can nearly bring Excel to a standstill. This is not a negative comment, I appreciate your work, just saying Excel has its limitations.
I am founder of Fusion Analytics World (http://fusionanalyticsworld.com) , your materials are just amazing. I would like to invite you to feature in our top stories for month of August. If you can pen down an article on how to do analytics in excel since most believe they need to be great in R, Python etc undermining the value of excel and use in analytics. Let me know.