Game for some charting awesomeness?
Off late, I have been doing a lot of data analysis and visualization on performance ratings, salary hike, gender pay equality etc. Today let me share you an awesome way to visualize massive amounts of data.
Scenario: Your organization of 3,686 people recently went thru annual performance ratings & review process. At the end of it, everyone was offered some salary increase (from $0 to $24,000 per year). You have 7 business groups. How do you tell the story of all these salary hikes in one chart?
How about this one?
Ready to know how to create this in Excel? Read on.
Tutorial: Creating jittered scatter plot in Excel
That is right, what you are seeing above is good old scatter plot with a bit of jitter (random noise added to X values). This way, when too many dots are at a single point, we spread them apart to show more.
Let’s look at data:
Here is a sample of 3,500+ employee’s ratings and salary hikes (randomly made up), with the usual columns:
Convert rating and group names to numbers:
Since we can’t use rating and group names in XY plot (we need numbers, not text), let’s convert these in to numbers using simple MATCH() formula.
We get two new columns, like below:
Creating X & Y values from data:
Next up, we need to generate the X & Y values for our plot.
Y value: This is easy. It is the amount of salary increase with two twists:
- If employee got $0 hike, we want to omit them in the plot. This will remove many of dots from the plot (less clutter)
- If an employee is unrated (even if they got a hike), we want to omit them too. This is because our plot has only 4 rating levels per group. There are very few unrated people and they are not the focus of this chart.
We can create Y value using a simple IF formula like below:
- =IF(OR([@[Salary Increase $]]=0,[@[Rating 17 (number)]]=5),NA(),[@[Salary Increase $]])
X value: This is the tricky bit. Since there are 7 groups, each with 4 ratings (excluding the unrated), we have 28 possible X values. We want to space these out so dots for one group + rating combination don’t encroach other combination.
Let’s say we give 10 units of space per group.
That means, we have 2.5 units of space per rating in that group (and total of 70 units of space).
Now, the dot needs to plotted at the center of this 2.5 unit of space (ie at 1.25)
The basic formula would be: =[@[Business Group]]*10+([@[Rating 17 (number)]]-1)*2.5+1.25
But what about the jitter?
Aah, right. We need to add random noise to X value. Since each rating has 2.5 of space, how about noise between -0.7 to 0.7 ? This still leaves plenty of space on both ends thus keeping the plot clear.
We can use below formula to generate the noise.
=RANDBETWEEN(-700,700)/1000
The final formula for X value goes like this:
=[@[Business Group]]*10+([@[Rating 17 (number)]]-1)*2.5+1.25+[@Noise]
Here is how our X,Y looks at this stage:
Data prep done, let’s move to the plot.
Creating jittered scatter plot
- Select both X & Y values and insert XY plot. We get this.
- Set X axis limits and remove title: As all our dots are between 10 to 80, let’s set them as limits for X axis. Also, let’s remove the chart title.
- Add vertical gridlines: Although our dot towers are separated from each other, adding grid lines makes it easy to read the chart.
- Format the markers: Set fill to solid color and 25% transparency. This makes the dots look nice and shows the density when there are too many people at some co-ords.
- Set Y axis limit: So that we can focus on people getting salary increase of up to $10,000. This zooms the chart to meaty part while showing plenty of outliers. We get this:
- Last step: Remove plot and chart borders, so we can add extra info, labels etc.
Ok, now our chart is almost ready. Next step, making it a story.
Create a wireframe in 10 column area, as shown below:
Next place the chart inside the red box. Adjust plot area size so it fits in to 7 columns. Hold ALT key when adjusting so chart’s plot area would fit in to 7 columns. You need to repeat this step every time you fiddle with the chart. So do it at last.
Add extra story points:
- A clear and descriptive title
- A sub-title explaining what is going on and how to read the chart.
- Group names and rating names. You can use the below trick to align the rating labels inside cell nicely.
- Show some more stats like median hike, median new pay (if you have it), head counts and unrated counts.
- Add any footers, disclaimers (about excluded people in the plot etc.)
- Add a border around this entire wire frame so it all looks like one piece.
- Shade alternative columns in some dull color. This improves the readability. As our chart is transparent, cell fill colors will show up nicely.
We are done.
Inspiration for this – R
That is right. You can create a similar plot quicker and better using R. ggplot, an R library has built-in support for jittering dots on XY plots. So using that, you can create below chart with just 7 lines of code. This is what you get (yes, you can show each rating dots in different color, and yes, you can order the groups by number of people in them).
Here is the R script if you want to experiment.
Download Excel Chart
Click here to download the workbook containing this chart, tutorial and raw data. Try re-creating it in Excel (or your favorite visualization tool) to learn more.
How do you like this chart?
I had lots of fun making and tweaking this chart. It shows some interesting patterns about how salary hikes are distributed across groups and where everyone is.
How do you like this? Do you plan to add some jitter to your busy scatter plots? Please share your thoughts in comments section. And if you want some inspiration, check out more such charts.
Jittery about charts?
If you love story telling and beautiful visualizations but not sure how to get there, consider enrolling in our Excel School or 50 ways to Analyze Data programs. In these powerful courses, I teach you all about awesome data analysis and visualization techniques.
19 Responses to “Awesome chart to visualize Salary Increases for 3,500+ people [Tutorial]”
Integration with R is really awesome chandoo! Looking forward for more articles on Excel+R.
Thanks
Hi Chandoo,
I am getting following error in R script:
Error: Faceting variables must have at least one value
Please tell where i am missing something.
Thanks
Hello Again! I think there is problem in CSV file what i made from Excel.
Could you please share CSV file which were used in article.
The CSV file needs below columns with names as indicated.
Emp Num Performance Rating Group Name Salary Increase $
...data...
Here is a link to the CSV I used. You can create this from the data in Excel.
http://chandoo.org/wp/wp-content/uploads/2017/08/rem-data-jitter.csv
Thanks Chandoo!
It works like a charm 🙂
[…] http://chandoo.org/wp/2017/08/17/visualize-salary-increases-jitter-plot/ […]
Very nice. But unfortunately this is too advanced for me. I'd like to learn about the basics, eg. the IF formula. Do you already have a post that explains that? Many thanks!
Chandoo,
An excellent example of charting...especially the jitters !
This will find its way into my portfolio
Thank you
Too bad that tidyverse isn't supported right now in PowerBI to publish online.
I'm still very new to R... But from my understanding, tidyverse is collection of individual packages... so I guess I can load individual library(s) and should work. Now to go and find where each function belongs 🙂
Success!
Minor modification: You need these 4 libraries
library(ggplot2)
library(magrittr)
library(dplyr)
library(readr)
And add all columns into Values field and change rem_data line to...
rem_data <- dataset
is there a way to color a particular data record red?
Thinking of trying to use this in a dashboard to automate the last 30 days data where the red is "Today"
You can use another series to separate today (or any other criteria records). If you are not sure, send me the dummy data set @chandoo.d@gmail.com and I will show in a future blog post.
I was playing a bit more with the jitter plot and found issue where R visual returns error when slicer is used on it.
With sample data when slicer is used to filter for "Operations", the visual returns error. While it works fine on "Development".
Looks like the issue is when there is no record (visible on chart or not) that belongs to particular rating (Ex: NME).
I circumvented this issue by adding dummy data for each rating category to Group Name (I added 0 increase since these are filtered from the chart in script).
Will report back if I find more elegant solution within R script.
Thank you so much. The download works great and the tutorial is excellent. I customized it like crazy and didn't manage to break it. You are awesome!
Great post. I'm unable to understand the X axis formula setup? Can someone please take the time out to explain why it is so. I tried reading the article twice but not getting it. Thanks.
=[@[Business Group]]*10+([@[Rating 17 (number)]]-1)*2.5+1.25+[@Noise]
Excellent article, thanks! What should be the formula for placing the points so that they are not distributed in random order, but evenly from the center? I distributed wage values for 150 ranges. And in each of them I want the points to be uniformly distributed from the center to the right and to the left within [-0.7, +0.7]. Something like beeswarm method Center in R (http://www.cbs.dtu.dk/~eklund/beeswarm/).
[…] before I started building this visual, I’d fortunately been reading an article by my good friend Chandoo, in which he “jittered” some dots in an Excel scatter […]
Hi Chandoo
I need to create a program to print stagewise fare chart. The chart will be printed based on fare & stagewise Kms. Example -
Km Stage
0 Source
5 A
10 B
14 C
Fare Chart will be printed as -
Km Stage
0 Siliguri 00
5 Fulbari 10
10 Fatapukur 20 10
15 Jalpaiguri 30 20 10
Can you please help me.
Hi Chandoo,
Thank you for the opportunity and for all the awesome excel, you are amazing
Really love them all,
Best wishes