Charts and Graphs

Correlation vs. Causation [Charting Chatter]

Last updated on August 3, 2016

Chandoo

Here is a trap that is easy to fall in to. Confusing correlation as causation. As analysts, it is our job to see the data as it is rather than imply causation that doesn’t exist.

Let’s sample a chart, recently featured in Economist’s graphic detail under the title Measuring well-being.

At the end of the article they show the above chart and say, (emphasis mine)

BCG also compared financial inclusion (the percentage of individuals aged 15 or over with a bank account) against each country’s SEDA score, revealing a clear relationship.The report’s authors found that countries with higher financial inclusion generally had higher well-being than their peers at a similar income level.

Now, it is easy to misunderstand above chart and come to a conclusion that if you provide bank accounts to everyone in a country, then the country’s SEDA score (well-being) would improve.

But this would be wrong.

As an analyst leaning towards awesomeness, you want to ask questions like,

Is it a coincidence that countries with higher well-being naturally have bank accounts for everyone?
How does having a bank account impact a person’s well being?
What other factors can contribute to well being? What about access to education, healthcare and information?

So there you go. Next time you see data or visual analysis that exhibits correlations, do not confuse them as causation. Question the factors first.

More charting principles and guidelines for you

Do you often confuse correlation with causation?

Have you seen such misinterpretation of data in your work life? What was it like? Please share your thoughts in the comments section.

Share this tip with your colleagues

8 Comments
Ask a question or say something...
Tagged under charting, charting principles, correlation, economist charts, visualization principles
Category: Charts and Graphs

Want an AWESOME
Excel Class?

Overall I learned a lot and I thought you did a great job of explaining how to do things. This will definitely elevate my reporting in the future.

Rebekah S

Reporting Analyst

FREE Goodies for you...

Related Tips

auto format excel values in thousands / millions / billions

Charts and Graphs

8 Responses to “Correlation vs. Causation [Charting Chatter]”

Eric C. Lind says:

August 3, 2016 at 5:39 pm

It's the age old question, if there's a positive relationship between church and crime, that is as crime increases there is a corresponding increase in the number of churches, do we blame increases in crime on an increase in the number of churches? Or are churches an outcome of increasing crime?

Correlation and regression CANNOT tell you the answer. In fact there probably are other factors which drive this relationship. The classic example of this is, "When big ships leave harbor, small ships return", but the cause of this paired relationship is dependent upon a 3rd factor, "because the tide rises".

There are other modeling techniques which can model causality, such as Structured Equation Modeling, but as far as I know these are not reproducible in Excel; at least not without a lot of very complicated math.

Reply
Eric C. Lind says:

August 3, 2016 at 6:00 pm

While you're at it, Chandoo, you should consider doing a series of posts on the Data Analysis toolpak. Excel is somewhat limited in its statistical capability, but you can run quite a few tests with it.

Pearson Correlation (and Spearman Correlation with a data rank helper column)
Single & Multiple Regression
Covariance (correlation of variances, e.g. does the variability in one factor correspond to changes in variability in another?)
ANOVA (Difference of means between multiple factors)
F-Tests (test for differences in variances between two independent samples of, probably, the same factor)
T-Tests to compare two means.

I don't use Z tests generally, because I almost never have a sample of equal size to the population, but it's there. Generally a T-Test is sufficient.

One of the the assumptions of ALL of these data analysis techniques is that the data is normally distributed, which it very rarely is. These are referred to as parametric tests. There are non-parametric equivalent tests for each of these, but Excel doesn't support them, except for Spearman Correlations which I've mentioned above.

Best,

Eric~

Reply
- Karen E. Rosen says:
  
  November 8, 2016 at 2:51 pm
  
  Hello Eric,
  
  I need to use the Excel Data Analysis toolpack and calculate the tests you mention above (6:00 pm). Can you recommend, or do you offer, a 1:1 / remote tutoring? I'm looking for about 1 hour of assistance on a homework problem.
  
  Best,
  
  Karen
  
  Reply
Eric C. Lind says:

August 3, 2016 at 6:13 pm

One last thing, and I'll shut up. 🙂

Some good formulas outside of the Data Analysis Toolpak:

=percentile(array,n) which gives you the Nth percentile of an array.

Fire departments in the USA frequently look at the 90th percentile response time for an array of response times.

=correl(array1,array2), which gives you the Pearson correlation between two arrays or factors.

Some folks might also benefit from:

CHISQ.DIST, CHISQ.DIST.RT, CHISQ.INV to compare categorical data.

CONFIDENCE.NORM and CONFIDENCE.T to estimate the value of a population mean between two values, that is the mean of the sample is X, with a confidence interval of between X1 and X2 for the population.

LINEST for regression using formulas, but this technique ignores Beta Coefficients; key for understanding X factor behaviors.

Ok, enough of that. 🙂

Reply
hpchavaz says:

August 4, 2016 at 2:23 pm

The source of the problem is largely the graph.

When a correlation exists, the axes of x and y are normally chosen to support the correlation, X means Y

In this particular case, the axes are reversed.

Reply
Nanna says:

August 5, 2016 at 8:53 am

IMO there's a causation between the two but it's not the bank account that makes people happy. It's having enough money so that you need a bank account, i.e. not living from hand to mouth.

Reply
Eric C. Lind says:

August 5, 2016 at 2:58 pm

Yesterday I saw an article where some academics have come up with a technique (called the Additive Noise Model) to measure causation for continuous variables using observational data only.

http://www.vocativ.com/335705/correlation-causation/?utm_source=Facebook&utm_medium=CPC&utm_campaign=CK-VOC-TRF-000-FB-FBLP-FKW-ENG.W-MED-BO-15a-R28

This is a big deal, because this technique can unlock answers to a huge number of questions.

The academic paper, published in the Journal of Machine Learning Research) can be found here:

http://arxiv.org/pdf/1412.3773v3.pdf

Reply
Ana Carol says:

May 2, 2023 at 12:31 pm

Great article, Chandoo! I appreciate how you emphasize the importance of distinguishing between correlation and causation, and the dangers of assuming one implies the other. My question for you is: what advice would you give to analysts who need to communicate findings to non-technical stakeholders who may not understand the difference between correlation and causation?

Reply

Correlation vs. Causation [Charting Chatter]

Chandoo

More charting principles and guidelines for you

Do you often confuse correlation with causation?

Get FREE Excel + Power BI Tips

Welcome to Chandoo.org

How to calculate WEEKNUMBER in Month / Quarter / Year with Excel?

Automatically Format Numbers in Thousands, Millions, Billions in Excel [2 Techniques]

How to compare two Excel sheets using VLOOKUP? [FREE Template]

Sales Analysis Dashboards with Power BI – 30+ Alternatives

Announcing Power BI Dashboard Contest (win $500 prizes!)

Related Tips

Automatically Format Numbers in Thousands, Millions, Billions in Excel [2 Techniques]

Make an Impressive Interactive Map Chart in Excel

How to Create a Dynamic Excel Dashboard in Just 5 Steps

How to create a fully interactive Project Dashboard with Excel – Tutorial

9 Box grid for talent mapping – HR for Excel – Template & Explanation

6 Must Know Line Chart variations for Data Analysis

8 Responses to “Correlation vs. Causation [Charting Chatter]”

Leave a Reply

Get FREE Excel & Power-BI Newsletter

Get Started

Online Classes

About

Downloads