Correlation vs. Causation
Okay, I have been traveling the past week and my email is full of Freakonomics moments. In the event you have not read the book, Freakonomics does a great job of illustrating the differences between coincidence and causality. In short, coincidence does not mean causality.
Our first moment is this interesting little story. Several blogs have linked to a story citing research from Professor Martin Schmeldon of Harvard Business School that suggests excessive Twitter use may have caused the current economic downturn. Here’s the fun chart included in the story.
Yes, this is a joke and yes, some are taking the story seriously. For the record, there does not appear to be a Professor Martin Schmeldon at Harvard.
However, it is a funny illustration of the folly of confusing correlation and causation and should be required reading for both journalists and marketers.
For those looking for a 10 second explanation on causality, here’s a great illustration. Enjoy!
For the marketers out there, here’s a little deeper explanation of what to look for when presented data purporting to illustrate a point like our Twitter story above.
First, these reports and analysis are based on observational studies - also called prospective or cohort studies. In these studies, researchers look for disparities between large populations of people with different attributes. If disparities are found to exist between the groups, then researchers try to make the case that the differences (i.e. diet, lifestyle, advertising, etc.) is the driving force behind the disparity.
The observational study demonstrates a correlation. But at this stage that would be just a hypothesis - not a fact. So how do you test your hypothesis?
For most applications, regression is the next step. Drug companies and medical researchers can conduct clinic trials, but for marketers, regression is usually the next step. In most cases, correlation and regression should typically be performed together.
Correlation analysis measures the degree of association between two sets of quantitative data. For example, how are sales of product A correlated with sales of product B? Correlation is usually followed by regression analysis.
Regression is a statistical technique for the modeling and analysis of numerical data consisting of dependent variables and of one or more independent variables. Basically, regression analysis is used to explain the variation in one variable (dependent variable) based on the variation in one or more other variables (called independent variables).
The goal of the analysis is to ascertain the causal effect of one variable upon another—the effect of a price increase upon demand, for example, or the effect of changes in the money supply upon the inflation rate. For marketers, regression is typically associated with questions of sales forecasting based on independent variables.
Regression analysis is valuable tool but one that is often misused. It takes considerably more skill to critique a model than to fit a model. But in the right hands, it is an invaluable tool that can help organizations model future sales, predict changes to sales and revenue based on actions by competitors, etc.
Data needs interpretation, to be understood and shaped to use in actionable (and profitable) ways. It is imperative that marketers (who are not in research) ask questions about the inputs used for the analysis. It is not necessary to understand the statistics used by the analyst or the consultant, but understanding the ingredients used by the analyst is very important.