A good chart tells a story. It is as simple as that.
Here is a fantastic example of what a good chart is. See the Time spent eating vs. National obesity rate chart below
It takes may be 5 seconds to understand what the chart is. And then you know the story. What is more interesting is, it instigates the readers curiosity to ask questions and understand the data. For eg.
- Q: Why obesity is high in countries where they spend less time eating?
- May be because of the fast food
- Q: What are the Turkish people eating for 160 minutes a day?
- May be they like the turkey cold. Okay, bad joke
The chart itself is very simple and easy. But it brilliantly juxtaposes two pieces of data: Obesity rates in countries and Time spent eating per day, to tell a story.
Are your charts telling a story?
Hat tip to Marginal Revolution for the chart.
More charting principles: Why KISS is important when it comes to charts | Vizooalization – 5 Lessons from zoo on visualization
25 Responses to “A Good Chart is a Story [Charting Principles]”
Here's my question: Could you have drawn that line if it wasn't there? My conclusion from this graph is that there is no statistically significant correlation.
@Dick: I wouldnt have drawn the trend line as it is not a very strong trend. But then may be the country list isn't exhaustive.
With or without the line, I am sure the chart is still a very good story. It brings two rather simple pieces of data together very beautifully.
Dick - great point. You could just as easily have drawn the trend line through New Zealand, Germany and Norway to reverse the message.
It's worth reading the comments on the original source site - there are a number of interesting points there about different interpretations of the data. For example, the countries also seem to fall into 4 distinct horizontal bands - 1) USA and Mexico; 2) UK and former British Commonwealth countries; 3) Europe; 4) Far East.
And what about the enormous populations that are not shown at all - China, India, the whole of Africa, the whole of South America, and so on.
This might be a nice chart but I'm afraid it's a case of seeing what you want to see.
The r value for a linear regression on this data set (18 points) is 0.450. At 95% confidence, the r value has to be > 0.468. This means you can't reject the null hypothesis that the "correlation" is due to random error. Chances are >95% this is due to random error.
If there was some physical reason why another type of regression would fit, polynomial, exponential, then another type of fit might be more appropriate but without a theory to suggest what type of line to fit, then linear is as good as anything.
Here's the data I used (extracted from the original source).
Country Eating Obesity
Australia 89 21.7 Rsq 0.202515323
Belgium 109 12.7 Count 18
Canada 69 18 R 0.450017025
Finland 81 14.3
France 135 10.5
Germany 105 13.6
Italy 114 10.2
Japan 117 3.9
Korea 96 3.5
Mexico 66 30
New Zlnd 130 20.9
Norway 82 9
Poland 94 12.5
Spain 106 14.9
Sweden 94 10.7
Turkey 162 12
UK 85 24
US 74 34.3
Sorry that didn't format well. First number is eating time and the second is the obesity rate. The first three lines also have the r-squared, number of datapoints and r value appended.
I would say this is a fun chart taking two easy concepts and looking at their correlation (but not necessarily their relationship due to the argument of causality).
Answering your question...no it does not tell a story, but it certainly stirs a debate 😉
The most important thing I learned from my Econometrics studies in University many moons ago is this: with a few exceptions, we should use data to prove or disprove a theory we have formed, rather than trying to form theory around the data.
If we're going to plot one thing against something else, we better know why we choose the specific variables we did, and not some other ones. If we're trying to analyse a multidimensional problem, we should look at the relationships between many variables, not just 2 that might play an unspecified part.
We should definately steer clear of a 2 dimensional chart.
If you have not correctly described your model, the errors in it becomes so large that they swamps the statistical significance of your findings.
if your problem is mulidimensional, use a multidimensional medium (such as a well constructed OLS model). Charts are simply not multidimensional enough for many of the issues we might want to examine to draw stong enough conclusions from . Unfortunately people can't read multidimensional charts, so we better NOT use a chart, and instead use some econometrics, or a whole series of charts, or both.
The econometric models I studied in University many years ago used 'Least Squares Regression' - just as Excel does when it fits a trend line. But you would never trust a econometric model with just 2 variables in a complicated case like this, for good reason. So we shouldn't trust excel either.
A more robust model might include info about how much they eat per sitting, the type of food they eat, the amount of work/leisure time they enjoy, and so on. Any of these things may tell a far better story.
Just looking at this chart, It looks unhelpful to me. Why is New Zealand (where I live) so radically different than Australia (a very similar culture in a very similar part of the world).
If you plot GDP growth against cumulative beer consumption then you get a much stronger looking correlation. But a nonsense one.
Maybe it's not so much the correlation between time spent eating and obesity. Maybe it has more to do with the life style that supports spending more time eating. More relaxed and leisurely, at a slower pace.
[...] I was reading an article today that showed what was said to be a great example of a good chart. By looking at it I have to completely disagree. The chart is supposed to demonstrate the [...]
very good discussion. I am 100% with the point that correlation is not causation.
But I still stand by my point that this chart is a good chart, just because it manages to connect with you and me and tell a story (good, bad, ugly, pretty is a different story :P). A chart becomes good when it opens the conversation.
The other important thing is, other than the trend line (poorly constructed, may be, but I guess it is the best fit for the given data), there is really no element of the chart that is trying to prove any hypothesis.
It is what it is supposed to be "juxtaposition of 2 mundane pieces of data to raise the curiosity"
Any chart, even the great ones are like knives. It is upto the user to find out what to do with them. A poor story-teller (or reader) may use this chart to conclude that Obese people spend less time eating. A great story teller (or reader) will ask questions (and eat a cookie or two)
I agree that the chart opens conversation, but it opens the wrong conversation. The conversation it opens in most people's mind is "People who spend less time eating are overweight!" The conversation it should open is "This is a mis-represenation of the data!". If I believe the chart, then if I spend zero time eating I should be incredibly overweight since the chart shows an inverse correlation between time spent eating and BMI.
This chart shows the dangers of taking multi-dimensional data and attempting to analyze it in a two-dimensional manner as Jeff points out.
Unfortunately many people look at this chart and try to use it to justify their pre-conceived opinions, e.g. "Americans eat too much fast food". Some charts have a large "lie" factor in that the misrepresent the numbers underlying the data. This chart has a large lie factor in that it misrepresents the relationship between the data.
I have not read the original document, nor the Times article. I hope they do not try to justify the correlation shown in the plot.
As always, I appreciate the Excel tips and information and the thoughts that your blog stimulate for me. Keep up the good work!
Chandoo - to say the chart tells a story, you've really got to spell out what you think that story is.
The trouble with stories is that we can only handle one at a time, and time (or attention) is a very limited resource. Meaning a story that is not great may take up valuable time that would have been better devoted to a story that was better, stronger, and more needed.
I think the only story the chart can say is that there is no meaningful relationship between these two variables in isolation. So for me, unless that's the specific story I'm trying to get across, then the chart itself is chart junk, because the chart is really nothing but redundant data ink. But if that was the specific story, then it's a waste, because I could have used the space to tell a much better story or 'parable' that could help change our understanding for the better.
THis has been a great conversation - one that is definately worthwhile having. Keep up the great work.
I agree that it's interesting, even if there is no correlation. I think what makes it interesting, as a chart type, is two fold: Almost everyone can identify with one data point; and it's easy to identify the extremes.
I once wrote about ego charts - charts of data by state or country - where the reader invariably finds his home state. Making that connection makes the chart fun to read.
And the extremes are always fun. I wouldn't have guessed Australia and NZ to be that high up. And I think if Turkish people would slaughter the lamb first, maybe it wouldn't take three hours to eat.
@Matt: I do not agree with your point that this chart is opening up the “People who spend less time eating are overweight!”. You cannot conclude from this chart that if you eat zero minutes you will be alarmingly obese. Correlation is NOT causation. Also, the chart never talks about people, it clearly says % of population with BMI above 30 at the country. As you can clearly see, it is poorly correlated with the eating times (but still correlated, and may be the hypothesis that this is due to random error cannot be rejected). Data / charts should not be blamed for peoples pre conceived notions, what this chart shows is a certain set of facts juxtaposed in certain way.
@Jeff: You have a very good point that stories should be used carefully since attention is scarce. I particularly liked and shared this chart because it manages to raise the curiosity, connects with reader (read Dick's comments below about ego charts) and in some way challenges conventional wisdom (by providing some facts on inverse correlation of eating time with obesity, it is common to think that people spending a lot of time eating are obese)
Just look at the amount of conversation this post alone generated just because the chart is featured. No body seemed to disagree with the point that "a good chart should tell a story", but everyone is talking about "whether this chart is right / correct". To be honest, not many charts even come closer to this discussion just because they are poor stories.
@Dick: You are bang on with the ego charts. I use this technique often to connect to my audience. , After all, we all are story tellers.
Hi Chandoo. this discussion really cuts to the heart of what a chart is, what it isnt, what it should achieve, and what it can't achieve.
As chart makers, we have a duty of care to ensure that when we choose to use a chart, we bear these distinctions in mind.
A chart is often NOT the best way to tell or illustrate a very complicated story, such as told in social policy settintgs. This chart is a very poor way of illustrating the 'story' of the causes behind obesity, because the truth behind the causes of obesity is very, very complicated.
Charts are however often the best way to illustrate more simple stories, such as told in business settings.
I'm not saying charts should only be used in business settings. I'm saying that charts are best used when the models we're trying to described are not too complicated.
You said the chart challenges conventional wisdom by 'providing some facts on inverse correlation of eating time with obesity' as if there IS a proven inverse correlation Are you saying there IS an inverse correlation, and here's a chart that illustrates it?. Or are you saying "this data appears to show a correlation between eating time with obesity"? Or are you saying something else?
You also say "As you can clearly see, it is poorly correlated with the eating times (but still correlated, and may be the hypothesis that this is due to random error cannot be rejected)" .
To my eyes, it doesn't say these things are poorly correlated. I think it says these things are probably NOT correlated under our simple hypothesis that:
% of popn with BMI>30 = X times minutes spent eating per day + natural variation
This chart does NOT give enough information to prove there is a statistically significant correlation between these factors. Even if there appeared to be a strong correlation, this would only exist IF you had described your model correctly so that it captured all the pertinet factors (i.e any other factors not in the model are left out because they are immaterial to the impact that time spent eating has on BMI)
Any correlation we can infer from a line on a chart is only correct IF the model described by the chart is correct. No chart tells us whether we should have used the data as is, or graphed the log of one or the other series, or whether we should have used a linear trend line or a polynomial one. We have to use logic to construct the model, then use a regression to test the hypothesis whether our model is correctly described or not.
Most people who read charts are not aware of these subtleties. If you place the line somewhere on the chart, you're telling these peopre that 'these things are correlated, and here's how'. Place the line somewhere else, and you're telling a slightly different ( or perhaps completely different) story. Omit the line and you're telling yet another story ...one that says 'I don't think these 2 things have ANY correlation.
Use a linear trendline, and get one truth. Use a polynomial trendline and get another truth. Omit one variable and choosle another, and get another truth. When it comes to 2 dimensional charts, truth (or an awareness of a lack of truth) lies in the eye of the beholder, which is strongly influenced by the decisions of the chart creator.
I believe that because of the choice to use a trendline in this particular chart, and because of the choice of what kind of trendline to use, then the chart IS in fact chart opening up the “People who spend less time eating are overweight!” story, as Matt says above.
If the idea was to have more readers comment on an article just for the sake of it...well...that's not really a very bright idea, is it. And if the idea was to show how good this particular chart is, then one would wish that you had picked up something better.
The way a laymen would look at this - "if I spent less time eating, the fatter I will be."
The way an expert would look at this - "the author just took 2 unrelated (yet no doubt, interesting) data points and combined them to make a chart. He then went a step further in explicitly trying to establish a correlation where there's actually none. And well....Chadoo picks it up and says "Here is a fantastic example of what a good chart is"
What's next - a chart between per capita income and color of the national flag 🙂
@Jeff & Mark: I appreciate you for taking the discussion to such detail. I have always learned and improved myself when someone disagreed with me. Only in the disagreement one can poke their mind to find the actual reasons and logic behind their actions.
Just so that my position is clear, in the second comment I have said "@Dick: I wouldnt have drawn the trend line as it is not a very strong trend."
If I were making this chart, I wouldnt have drawn the trend line but left the data where it is and let the reader choose what to read.
I agree that much confusion came because of the type (and nature) of the data.
@Jeff: I am 100% with you that,
> Charts are however often the best way to illustrate more simple stories, such as told in business settings.
> Most people who read charts are not aware of these subtleties. If you place the line somewhere on the chart, you’re telling these people that ‘these things are correlated, and here’s how’.
It is our responsibility to make charts that tell the correct story.
Believe me, when I looked at that chart, I didnt even notice the trend line. I looked at the dots because they matter most. And I thought, well, this is one neatly done chart. I read the times article, found that the author is curious to explore the relationship and found the chart fitting in that story perfectly.
For me the story wasnt "obese people eat less time", for me it was,
"Why the countries like US are spending less time eating?, Why Turkey is on the extreme? How the average eating times are in the countries that I know"
And these are the very same things I was referring to in my post. I never said the story is "obese people eat less time". I assumed that no one would reach such conclusion as it is absurd.
May be should have spelled out "what the story is", instead of just saying there is a story.
I really like how this discussion is shaping up. I appreciate the fact that you are willing to challenge me and make me a better person. 🙂
@Mark: I am very sorry for the misunderstanding. As I said above I wasnt really referring to the "obese people eat less time" story.
And the intention is never to get more comments. But the intention is to have a conversation and share ideas.
I thought, most of us, when we see a bunch of numbers (like annual sales figures, customer footfalls etc.), we automatically create a chart and paste it in the ppt / report. I wanted to challenge that behavior by saying "your charts should tell a story". They should
> combine data in ways that is difficult if you were to deal with raw numbers
> bring forward interesting patterns
> justify / challenge what people already believe
> and be fun
and I thought this above chart is a good example. But the trend line kind of played the spoil sport. But as I said, I neither would have constructed it myself, nor would have used it as a story.
It is my mistake that this message didnt come out properly.
I really appreciate the fact that you took time to make me a better person. I love the way this discussion is happening. 🙂
I wanted to create a chart similar to this one, showing assets and net sales of clients of ours.
My problem is, that I can not figure out how to do it. I searched everywhere until I luckily found this perfect example.
Can this be done in Excel and if so, how? Is it a scatterplot or something else?
@Martin: yes, this is a normal scatter plot. You can do this in Excel. Just mention a bunch of X and Y co-ordinates and create a scatter plot. Then you can change the dot shape and size using series options. Let me know if you face some difficulty...
Chandoo, thanks for your reply. I still cannot figure out how though. I have in column A the name of the client, in col B the assets and in col C the net sales. I tried to define my x and y co-ordinates in various ways but without success. The only way that would work is by defining each client independently (row by row if you know what I mean) but this cannot be correct. In your example above you would have 19 series for instance.
Hi Martin. Excel's scatterplot chart does not easily let you insert the names of the various points as in the example above. But there's a very cool free addin that does this: Rob Bovey's Chart Labeller, at http://www.appspro.com/Utilities/ChartLabeler.htm
Also, it's worth checking out the following:
Failing that, just search Youtube and you'll find lots of examples.
Hi Jeff, this is perfect. Much appreciated. Thanks a lot!
[...] I like this type of chart because it clearly tells the story of what happened in mobile handset market between 2007 and 2010. It shows how then leader, Nokia, kept loosing profit share despite a tiny loss in market share. It shows how new entrants like Apple have eroded the profit share for others. [related: good charts tell stories] [...]
[...] It tells a story. [why charts should tell a story] [...]
[...] reminds us that A Good Chart is a Story with this graphic showing that incidence of obesity increases as the time spent eating [...]