This post is from GuestBuster Jeff Weir in our Chart Busters series.
Note: The post slightly longer, but worth every word. Just get a cup of coffee and soak in to this visualization goodness. (Also, click on any image to see its full version)
Over at the FlowingData blog, they’ve been talking about this pretty slick looking Choropleth Map that shows how Medicare returns vary across the United States:
The above shows total Medicare reimbursements in 2006, either by Hospital Referring Region or by State, depending on the radio button. Using the dropdown box, you can change it to this:
…which is how the data looks if you overlay it on a Giraffe. Oops, I forgot to rotate it before saying that. Bear with me a moment…
There. See the Giraffe now? Good.
A picture is worth a thousand words, or so they say. But is a Choropleth worth the many line charts and clowns that you could squeeze into the same valuable screen real estate? Let’s find out, by evaluating what this particular chart does well, and what it does poorly, and whether other charting methods might better convey its information.
Words and music.
Right off the bat, there’s a simple way that the authors could improve this chart. While they include a description below the chart to point out what the data is, and where it came from, they miss something just as important…what they concluded from all of this. So before we consider adding – say – bullet graphs, let’s consider adding some bullet points. A few sentences can tell readers important stuff that would otherwise remain hidden in an undownloaded PDF report. Insights like:
- Care is often better in low-cost areas.
- Growth in returns are only partly explained by advancing technology, and
- Differences in growth rates across regions seem largely due to discretionary decisions by physicians that are influenced by the local availability of hospital beds, imaging centres and other resources-and a payment system that rewards growth and higher utilization.
Straight off the bat, this would make the graph a better graph…without even messing with its form.
But mess we must…
…because lurking below the chlorophyll green of this Choropleth Map are a few serious charting oversights. Ready? Let’s check ’em out.
Scale? Fail!
First, check out the legend.
Crikey…its bands are as discrete as Bruno. Its scale is about as even as my temperament. It varies about as much as =RANDBETWEEN(PaydayBankBalance, UsualOverdraft).
If you fire up Excel and look at the spread covered by each range, you see just how arbitrary the different price bands are:
Whoa…the spread of that $9k to $16k band is nearly 15 times larger than two of the other bands. That can’t be good, can it?
Nice profile
If you were to graph financial spread of each group against the aggregated number of Hospital Referral Regions that fall within each spread, you get something like a histogram. The difference between the sizes of these bands is about as different as the number of performers on stage at a Bob Dylan concert in 1964 compared with 1974. See for yourself:
Oops, wrong graphic. Try this:
Normally histograms have equal widths for each band, but here I want to highlight just how unequal the bands used are. Plus, this lets us regroup the data into evenly spread $1k bands, and overlay it on the first distribution, to see how it compares. Here’s one that I prepared earlier, with the red line as the regrouped data…
Vastly different picture isn’t it. The red is kinda like Data Pig’s heart rate before he eats chocolate covered bacon on Saturdays, and the blue is how his ECG would look when he’s in the ambulance, on the way to the hospital.
This makes it very hard to answer that important question “…compared to what?” With such different sized bands, how can we compare one to another? How can we be sure that the distributions within each band will even allow us to?
For instance, take the highest band spread of $9k to $16k: without any further information to go on, we might assume that the median (i.e. middle) value for districts in this category is midway between the $9k to $16k boundaries, like this:
But that’s like assuming that Simon and Chartjunkle (oops, Garfunkel) have equal talent. We’d be wrong. Verywrong. In actual fact, there’s only three data points to the right of our guessed median line. And as for the 55 hospital regions in Group Five that fall to the left of it…well, they all get tarred with the same brush those worst three performers. The actual median for this group is a lot further left, as shown below:
This means that over half the data in this 5th band actually falls much closer to the far left of the graph than to the far right of the same group it’s been placed in.
You can see this better if you add a one-dimensional strip plot above the graph, which gives an idea of where the 300 odd values fall within the entire range:
Whoa…looks like we’ve got a few outliers to contend with.
What a State we’re in…
This seemingly arbitrary ‘bucketing’ effect is exacerbated when aggregating the different hospital regions into State-wide totals. Except this time regions are being penalised by arbitrary geographical boundaries, as well as the arbitrary financial ones above.
Take Texas for example. Aggregating everything up to the State level, Texas appears in that highest band. Yet at the Hospital Referral Region level, one third of its 22 different hospital fall below the national average, and the median for the whole State is around $8,800. So we better be careful making assumptions from a State-wide view, because the Choropleth averages some very diverse costs over some very large chunks of real estate.
To see just how diverse, let’s rank the entire US values from smallest to largest, and highlight where the Texas readings fall within that range:
What can we tell from this? Firstly, nearly all regions nationwide fall between $5k and $10k. Secondly, there are a few outliers that really skew the picture at the high end. Thirdly, in the Texas case, the State average is boosted somewhat by 3 Texan districts that happen to be among the worst 10 culprits nationwide – one of which is clearly an outlier at $15k. Unfortunately for the lower cost Texan regions, they’re guilty by geographical association…kinda like being kidnapped and held for a zillion dollar ransom, just because you happen to live in the same State as Bill Gates.
So what do we get by aggregating to State boundaries? Probably more blurring than insight. After all, what good would a weather report be to Texans if it only reported the average weather they could expect as a State! Instead, it’s better to keep the aggregation at the Hospital Referral Region level. That way, we can look at this:
…and ask things like “Wow, why such a difference between Waco and the surrounding bits of Texas?” and “What the hell is Alaska doing there?”
Legends in the making…
What’s far worse that this though is that when looking at the State-wide map, the legend is now really, really wrong.
Here’s the legend next to the actual State-wide figures, for comparison:
Whoops…the graph title has changed to reflect we’re now looking at Medicare spending per beneficiary per State; i.e. State averages. The legend is still looking at Hospital Referral Region averages, which have a much greater spread. For instance, the Choropleth shows six States as being dark green regions, and the legend says they fall somewhere within $9k to $16k. But the actual data shows they fall in a $9.4k to $9.6k range. Oops! Slight misrepresentation, there.
How to fix it
Obviously this graph really should use a quantitative scale with equal increments; one that changes to reflect the selection that users make. What’s more, colors should have just enough variation so as to highlight any important differences, without being overwhelming or mistaken for camouflage.
But is a Choropleth Map the best way to present this data in the first place? If you want something for people to play with online, then maybe…but if you want to compare things very closely to other things, then maybe not.
For sure, a Choropleth Map looks cool, and it has what Tusha Metha calls “natural context”. But from an analytical perspective, a Choropleth only really reports how one thing changes with regards to geography. If geography is a major determinant – or if you want to show people how things look in their own back yard compared to others – then perhaps this is the piece of kit you need. But if there’s other factors that have much more sway on your data than geography, then perhaps not. For instance, we might want to see whether population density plays a significant part in Medicare returns, given the likely economies of scale from providing healthcare to densely populated regions vs. urban regions. Nows the time to break out a scatter plot:
Hmmm…looks promising. (Note: I’ve used State-wide data for the above…ran out of time to track down densities in the different Hospital Referral Regions, which is what I’d prefer to do.)
Or we might want to zoom in on the best or worst offenders, and see just how different they are to each other, and to the median value:
Conclusion
I think a better, fairer Choropleth Map at the Hospital Referral Region level would be interesting. But I don’t think it would be enough. To quote from Stephen Few’s latest book Now you see it: “Color is good at drawing your attention to something if used sparingly, but is one of the ‘pre-attentive attributes’ that is not quantitatively perceived in and of themselves”.
Whereas lines and 2D precision are very precise ways to encode quantitative values.
So when it comes to answering the ‘Compared to what’ question, I don’t think you can beat this:
Choropleth Maps in Excel
For information on the implementation of Choropleth Maps in Excel, check out Tushar Mehta’s excellent resources.
For more information on the pros and cons of Choropleth Maps, check out the Clearly and Simply blog, where Robert has built on Tushar’s excellent approach to produces some great downloadable templates. He also offers advice on potential drawbacks of Choropleth Maps, such as:
- No visualization of development over time
- No information on exact values (unless you are implementing tooltips including the data)
- Very limited direct comparability of the regions
- Possible perception problems with regards to the size of regions (e.g. Rhode Island on a US map)
- Possible misinterpretation because the size of a region may have a greater impact on the user’s visual perception than the intensity of the fill color
- Requirement of real estate on a dashboard
His recommendation: carefully consider whether or not a Choropleth Map is the best visualization for your purposes. Check out his dashboard of Lithuania at a glance to see how he mitigates some of the potential problems by incorporating other graphs into the display.
I used Robert’s template to produce this State-wide Choropleth Map of total Medicare spending per enrollee, 2006 using the same Medicare ranges as the Choropleth that’s the subject of this post:
…
…then I replotted the graph using data that had been regrouped $1k bands:
While I don’t advocate this approach, it’s interesting that even though this is aggregated to State-wide totals, you can see significant differences between the graphs.
Right, that’s it. I’m off to the Hospital to see someone about my writers cramp…
About the Author
Jeff is a Business Analyst from Wellington, New Zealand who has recently discovered a strong interest in Data Visualization. He swears by Edward Tufte and Stephen Few as much as he swears at Excel 2007. He’s so new to advanced Excel, that 2 years ago he had to ask a work friend what the dollar signs in $A$1 meant. Now that he knows that, he’s trying to find out what the dollar signs in $A$2 mean.
Note from PHD:
Thank you Jeff. Your passion and knowledge is truly outstanding. I have a whole pack of donuts waiting for you.
56 Responses to “Creating in-cell bar charts / histograms in excel”
Ay jhakkas!!!
Man, you're on a roll. A true-blue Excel innovator. What you're writing makes me think - why didn't anyone else think of this before?
Now that I've showered all the praises on you, it won't hurt to have a few comments on my blaag 😉
PS. I meant the innovator part.
@Amit ... thanks, I was also curious why this one was not explored, but again, I havent really searched a lot to ensure that I am posting the same ideas again. My intent is to make few people to benefit from this, if that happens I would be happy...
btw, posted a comment on your blaag... hope you are happy now 😀
Don't worry about repeating the ideas in the online world. As long as you are not copying it off anyone else and it is helpful for the readers, it's fine.
PS. the comment does not count.
The idea actually is not a new one :).
Check out MicroCharts
http://www.bonavistasystems.com/
to see how far you can get with font based in-cell charting
[...] can never get tired of in-cell charts, whenever I get sometime, I try to experiment something on them. Here is an idea to design true [...]
[...] Since we can insert any character in to a cell using formula, by installing a custom bar chart / pie font in our computer we can create incell graphs in excel with ease. Click here to see example pie chart, line chart. [...]
Where is the file? I can't seem to locate it. I want to donwload it. Thanks Chandoo!
Found it.
Great job, Chandoo. Love the site - and the fact that you provide downloads to help us (me) learn your secrets faster. I downloaded the font but can't figure out how to add it to my font library... Any hints? Thanks! Keep up the fantastic work.
@Mahqooi: Thank you and welcome to PHD 🙂
This is how you can install a font in a windows machine:
unzip the font files (if needed)
select and copy the font file to clip board by pressing ctrl+c
go to control panel > fonts
paste the file by pressing ctrl +v
repeat this procedure for other font files if any
if you are using mac, just right click on the font file and select install option.
let me know if you have some issues with this.
Hi Chandoo,
is there any mirrors for the bargraph font?
it seems that fontstruct.com is down for maintenance.
thanks!
@Cybsych: I am not sure if they have any mirrors. I will look in to my backup to see if a copy of the font can be located and ping you back. Thanks.
hi Chandoo, fontstruct is back online 😉
BTW, I am wondering about this in-cell chart.
How do I apply an automated conditional formatting to only a bar/point?
For example, the first image in this post, whereby RED = highest, BLUE=lowest.
Chandoo,
I guess this bars only work with positive numbers? so if you a list of costs per month, but one month you have negative cost meaning income due to let's say vendor credits. This incell bar could despict the month with a negative digit. or could it?
hi Chandoo, guess that you missed out my query 😀
is there a way to highlight the MAX and MIN bar based on the actual data (not the normalized)?
@Pedro, for that you need to have another set of characters (may be A-J for 0-9 and K-S for -1 to -9 and then use them to show the bars. It is a bit tricky, but achievable.
@Cybpsych: The highlighting was done manually (As you can see, there is probably no easy way to highlight / change colors of a portion of cell using Conditional formatting etc.). I am sorry, but you need to use someother sparkline technique to achieve this (or, write your own macro)
http://chandoo.org/wp/2008/09/05/microcharting-excel-howto/
thanks chandoo!
I love this simple and quick way of visualization results. I would like to learn more about normalizing values (i.e. the use of linear normalization). Can someone kindly point me in a good direction for this beginner? Much thanks to everyone (especially Chandoo) for the wealth of information provided. Long live the internet age!
@Jason: you can use simple excel formulas to normalize a set of values. If the list of values is in say a1: a10 and you want them to be normalized from 1 to 100, you can do that with a formula like: =A1/max($A$1:$A$10) * 100. Also, you can use the RANK formula to calculate the percentile of any value in the list.
[...] Bar | Sparklines | Pie charts | Bullet Graphs | w/ Conditional [...]
Nifty way to normalize the data....I'll have to take that into account when working with my charts.
One thing I'd like to add, you can eliminate the need for custom fonts with the bar charts by using a REPT function and using a small "g" set to the Webdings font. It's more likely anybody opening the file will have access to that font than the custom one you've provided. (More portability is a good thing 🙂 )
Portability is great.
I don't quite see how the REPT formula and the webding fonts can combine to solve the portability issue.
Mind you, i see that +REPT("g",1) will give you a bar, but we would need several bars of unequal lenght.
Can you elaborate?
Thank you
@Matt: I almost forgot about this comment. Thanks to Pedro for the bump.
As he points, portability is a good idea, but we will not be able to get bars of variable height using webdings font.
We can ofcourse use that along with text rotation and char(10) to create a pseudo incell bars. Here is a tutorial: http://chandoo.org/wp/2008/07/15/incell-bar-charts-revisited/
@Chandoo: Yep, that's exactly what I meant, use your text rotation and char(10) trick with REPT("G",) (then set the font to Webdings) to get your string of bars with variable height.
@Pedro: REPT("g",1) will give you one "g" (or in Webdings a bar of 1 height).
REPT("g",B2) will repeat for the value in B2... 🙂 Use that with Chandoo's take on linear normalizing, and yer all set.
Wingdings with an "n" character would be even more portable, but just doesn't look quite as cool...but pretty much everybody has that font, so it'd be portable.
You may have to adjust the font size in order to get all the bars to show correctly, perhaps some sizing of the row heights as well...
You can fake an incell line chart by using:
REPT(" ",B2-1)&REPT("n",B2)
where B2 is the value in the cell you want as a data point.
Wow, the formatting was horrid, let's elaborate a bit more...
REPT("",-1)&REPT("n",) - would give you a line graph, where could be a reference to each cell you'd like as a data point.
REPT just repeats a text string a number of times, it can be either a hard number (like Pedro's example), or a reference to a value in another cell (more handy). I believe Webdings is a common font in the MS Office suites I'm familiar with (2000 thru 2003), but I'm not sure of 2007's suite.
@Matt A: I am sorry for the formatting mishap. I am afraid of using too many plug ins, but I guess a simple HTML based comment box seems like a good idea now that lot more commenters are typing formulas and vba code in the comment box.
Coming to the formula.. thanks for sharing it. And yes, you are right, webdings is common to Office 2007 too. But even better solution would be to use good old pipe | symbol. When the font is Arial, the pipe character spacing looks optimum and subtle enough to look like an incell histogram / column chart.
After some searching through the character maps in Arial I noticed that there's a box symbol --> ? (created by holding ALT then typing 5595 on the numpad) that would work perfectly as another character to use for column charts. It looks just like the Webdings "g" character.
Is there a way to change the colour of the bars based upon the data. eg. 1-5 = red, 6-7 = amber, 8-10 = Green
@Ben... you can change the color of all bars in a cell using conditional formatting. But selectively changing color of bars inside cell is not possible unless you do it manually or through VBA.
[...] Creating in-cell bar charts / histograms in excel @ Pointy Haired Dilbert Filed under: Stuff [...]
Is this work only for the numbers or will it work for % data also. I tried to do the same for % data, but i didnt get. Pls let me know the formula for % data.
[...] trick is to use Incell Charts. [...]
Hello Chandoo,
I really like this, but I have Office for Mac 2011 and for the life of me I cannot figure out how to see the bargraph as an available font.
I have followed all the instructions for adding a font, but it does not appear. Do you have any suggestions?
Thanks
prb
Thanks. This one was cool and helpful. Can we experiment the same with "in cell" line graph as well? 🙂
Chandoo,
How do you "manually" change the color of the last bar in the series?
Lawrence
@Lawrence
Select the chart
Select the series
Select the last point/column of the series
Ctrl 1 or right click Format Point
Select a color
Hui,
Thanks!
I should have been more descriptive. What I meant to ask was about the in-cell bar graph created with the REPT function described above. How do I get the last REPT (the last bar) to be a different color than the rest?
Lawrence
@Lawrence
You cannot change colors in a cell using formula
You can use either VBA code or do it manually
Select the cell
Copy and paste it as values
Edit the cell F2
using the arrows move to the character you want to color
Shift and select the cell by arrow keying over it
with the characyer selected
Ctrl 1 (Format Cells)
Change the Font Color to suit
It won't be a color change per se...but you can set an IF statement in your REPT formulas for different characters to show as the bars. The characters "c" and "g" in Webdings are both boxes, one is a solid block, the other an outline.
For example, say I wanted to highlight the highest bar in my REPT formulas...my formula to translate the numeric cells A2:A15 to characters would be:
IF(A7=MAX($A$2:$A15),REPT("c",B7),REPT("g",B7))
so if the cell I'm checking (here it happened to be A7), is the highest number...its bar would display differently further along down in the concatenations...
@Hui...THANKS!
@ Matt A... Very cool idea. What formatting do you recommend for the cell? The Webdings "c" hollow box is very faded and hard to read even if bolded and bigger font size is used. If I could just punch it up a bit it would be perfect with 5 "c" columns followed by a single solid "g" column...as in showing the trend in the trailing 6 months of data.
Lawrence
@ Lawrence
Good question...lately I've been using ? (which you get from holding ALT then typing 5595 on the numeric keypad) for most of my bars. Unfortunately the character map doesn't lead me to a differently "shaded" box of the same size. Reason I use this nowadays...it's part of arial font...just a special char map character I can rapidly input w/o any formatting nonsense.
I'll check to see if I can replicate another box of same size that may have different shading using the same method...no luck as of yet.
I've just built the in cell bargraph and was trying to create a pop up window which would display the Monthly Sales for Last 12 months when they click on any of the bargraph cells
[...] Reference:Â http://chandoo.org/wp/2008/05/13/creating-in-cell-bar-charts-histograms-in-excel/ Like this:LikeBe the first to like this. [...]
[...] To quickly insert an in cell micro-chart, use REPT() function… Get Full Tip [...]
Hi, there is a problem with the Bargraph font. On my win7 machine it works perfectly but when I try to install it on my boss's mac it returns an error called " 'Name' Table Structure"
I tried to install on two different macs and the same error resulted. As a result the font does not show up as an option in any program.
Â
Just an FYI. I don't use macs but I know some people do.
Whats up! I just wish to give a huge thumbs up for the good info you might have right here on this post. I can be coming back to your weblog for extra soon.
[...] like .docx, .htaccess etc.) 43. To quickly insert an in cell micro-chart, use REPT() function… Get Full Tip 44. COUNT() only counts number of cells with numbers in them, if you want to count number of cells [...]
Thanks Chandoo for the font!! It works great once installed on my machine, but is there any way (besides printing and scanning the doc) that I can get the graphs to show up on other peoples' machines without going through the font install process? My file has to be sent out to clients that don't have that font installed.
Sarah, Excel doesn't allow embedding of fonts (aside from a workaround using a macro). The font will need to be sent to all who want to view the file. I went through the same question with my boss. I ultimately just installed the font on her computer.
If the data is only to be viewed, and not modified, moved, etc. you can save the file as a pdf. The font can be viewed that way.
Hello every one there is a problem I need auto update summary formula from other sheets data pick please give me sample file and also auto up grate summary sheet format.................
@Joesali
I'd suggest asking this type of question at the Chandoo.org Forums
I'd suggest uploading a sample file also
Hi chandu,
Apart from excel, i need the formula to find bar graph height dynamically when using with log scale, for example for linear graph i would take the maximum value to height of the panel as
(value divided by maxvalue) * height.
Now , i am using a logarithimic graph can you tell me the right formula which fits perfectly.
Thanks in advance
Nice info... Thanks... very hepfull... 🙂
The font does not seem to be available at fontshop. Is there somewhere else to download the bargraph font?
@Amber
Try doing a Google search for Bargraph Font
it returns several possibilities
Is there a way to do this without using bar graph font? We have a financial report to be published to stakeholders and they will not have this font installed, so probably will not be able to view the bar chart as well.