Game for some charting awesomeness?
Off late, I have been doing a lot of data analysis and visualization on performance ratings, salary hike, gender pay equality etc. Today let me share you an awesome way to visualize massive amounts of data.
Scenario: Your organization of 3,686 people recently went thru annual performance ratings & review process. At the end of it, everyone was offered some salary increase (from $0 to $24,000 per year). You have 7 business groups. How do you tell the story of all these salary hikes in one chart?
How about this one?
Ready to know how to create this in Excel? Read on.
Tutorial: Creating jittered scatter plot in Excel
That is right, what you are seeing above is good old scatter plot with a bit of jitter (random noise added to X values). This way, when too many dots are at a single point, we spread them apart to show more.
Let’s look at data:
Here is a sample of 3,500+ employee’s ratings and salary hikes (randomly made up), with the usual columns:
Convert rating and group names to numbers:
Since we can’t use rating and group names in XY plot (we need numbers, not text), let’s convert these in to numbers using simple MATCH() formula.
We get two new columns, like below:
Creating X & Y values from data:
Next up, we need to generate the X & Y values for our plot.
Y value: This is easy. It is the amount of salary increase with two twists:
- If employee got $0 hike, we want to omit them in the plot. This will remove many of dots from the plot (less clutter)
- If an employee is unrated (even if they got a hike), we want to omit them too. This is because our plot has only 4 rating levels per group. There are very few unrated people and they are not the focus of this chart.
We can create Y value using a simple IF formula like below:
- =IF(OR([@[Salary Increase $]]=0,[@[Rating 17 (number)]]=5),NA(),[@[Salary Increase $]])
X value: This is the tricky bit. Since there are 7 groups, each with 4 ratings (excluding the unrated), we have 28 possible X values. We want to space these out so dots for one group + rating combination don’t encroach other combination.
Let’s say we give 10 units of space per group.
That means, we have 2.5 units of space per rating in that group (and total of 70 units of space).
Now, the dot needs to plotted at the center of this 2.5 unit of space (ie at 1.25)
The basic formula would be: =[@[Business Group]]*10+([@[Rating 17 (number)]]-1)*2.5+1.25
But what about the jitter?
Aah, right. We need to add random noise to X value. Since each rating has 2.5 of space, how about noise between -0.7 to 0.7 ? This still leaves plenty of space on both ends thus keeping the plot clear.
We can use below formula to generate the noise.
The final formula for X value goes like this:
=[@[Business Group]]*10+([@[Rating 17 (number)]]-1)*2.5+1.25+[@Noise]
Here is how our X,Y looks at this stage:
Data prep done, let’s move to the plot.
Creating jittered scatter plot
- Select both X & Y values and insert XY plot. We get this.
- Set X axis limits and remove title: As all our dots are between 10 to 80, let’s set them as limits for X axis. Also, let’s remove the chart title.
- Add vertical gridlines: Although our dot towers are separated from each other, adding grid lines makes it easy to read the chart.
- Format the markers: Set fill to solid color and 25% transparency. This makes the dots look nice and shows the density when there are too many people at some co-ords.
- Set Y axis limit: So that we can focus on people getting salary increase of up to $10,000. This zooms the chart to meaty part while showing plenty of outliers. We get this:
- Last step: Remove plot and chart borders, so we can add extra info, labels etc.
Ok, now our chart is almost ready. Next step, making it a story.
Create a wireframe in 10 column area, as shown below:
Next place the chart inside the red box. Adjust plot area size so it fits in to 7 columns. Hold ALT key when adjusting so chart’s plot area would fit in to 7 columns. You need to repeat this step every time you fiddle with the chart. So do it at last.
Add extra story points:
- A clear and descriptive title
- A sub-title explaining what is going on and how to read the chart.
- Group names and rating names. You can use the below trick to align the rating labels inside cell nicely.
- Show some more stats like median hike, median new pay (if you have it), head counts and unrated counts.
- Add any footers, disclaimers (about excluded people in the plot etc.)
- Add a border around this entire wire frame so it all looks like one piece.
- Shade alternative columns in some dull color. This improves the readability. As our chart is transparent, cell fill colors will show up nicely.
We are done.
Inspiration for this – R
That is right. You can create a similar plot quicker and better using R. ggplot, an R library has built-in support for jittering dots on XY plots. So using that, you can create below chart with just 7 lines of code. This is what you get (yes, you can show each rating dots in different color, and yes, you can order the groups by number of people in them).
Download Excel Chart
Click here to download the workbook containing this chart, tutorial and raw data. Try re-creating it in Excel (or your favorite visualization tool) to learn more.
How do you like this chart?
I had lots of fun making and tweaking this chart. It shows some interesting patterns about how salary hikes are distributed across groups and where everyone is.
How do you like this? Do you plan to add some jitter to your busy scatter plots? Please share your thoughts in comments section. And if you want some inspiration, check out more such charts.
Jittery about charts?
If you love story telling and beautiful visualizations but not sure how to get there, consider enrolling in our Excel School or 50 ways to Analyze Data programs. In these powerful courses, I teach you all about awesome data analysis and visualization techniques.