Forgive me for posting just to link to something, but I’ve wanted to share this for awhile. It’s a blog post by Greg Laden on the constantly expressed “correlation is not causation” line. I don’t think a day goes by that I don’t hear someone online respond to data by saying “correlation does not imply causation!” as if that were an argument against any specific claim, rather than a vague and frequently unhelpful truism. Laden:
Sometimes, we have reason to believe that two things co-vary because of one or more external causes. Aridity in one region of the world is correlated with higher rainfall in another reason of the world, and it turns out that both meteorological variations are caused by the effects of the Pacific El Nino. Quite often, especially in complex systems like are often dealt with in the social sciences, we can replicate correlations among various phenomena but we may have multiple ideas about what the causal structure underlying the phenomenon at hand may be. Repeated observations rule out random associations or meaninglessness in the data, but we are faced with multiple alternative models for where to put the causal arrows. In other words, we’re pretty sure there is a “causal link” somewhere, but we can’t see, or agree amongst ourselves, on what it is.
I can, for example, show you a correlation between the percentage of a given school’s students that utilize the free lunch program and that school’s performance on certain standardized tests of academic performance. Now: does whether a student eats his or her lunch for free cause him or her to perform worse on a standardized test? Of course not. We are using that as a convenient, easily-accessible and highly effective proxy for socioeconomic status. But even then, are we claiming that it’s actually the money (or lack thereof) in a parent’s bank account that causes academic deficiency? Of course not. We are assuming a line of influence from an observable and influential variable that frequently results in a given outcome, even though we are still unsure about every discrete step within that line of influence. We are pointing out that a complex, multivariate phenomenon like the relationship between parental income and student performance can nevertheless be understood as producing certain consistently observable consequences through which we can make educated decisions about our policy.
To look at such a correlation and say “poverty is definitely the sole cause of this particular student’s performance on this standardized test” or “all students from poverty will score similarly on a standardized test” would indeed be stupid; but who is suggesting such things? And it is, in my opinion, equally stupid to look at the remarkably consistent and robust correlation between socioeconomic status and educational performance and respond by saying, “well, correlation is not causation, so we just have to consider these as separate phenomena.” Like Laden says: don’t be a dumbass. Use your head; use common sense. Because frankly, if we are afraid to speak about correlation because of the true but frequently irrelevant fact that it doesn’t ensure causation, our jobs as both students of reality and citizens of the world become vastly harder.
MikeJ
Correlation does not imply causation, but it’s a damn good place to start looking.
Linda Featheringill
You know, you’re preaching to a congregation of techies that really, really like to work on the actual physical causes of physical problems.
To them [us?] correlations are like ruffles on a bikini: interesting but not essential. You can have an acceptable bikini without the ruffles but the ruffles alone don’t constitute acceptable coverage.
azrev
Just like cousins don’t have the same parents but they do share grandparents.
Southern Beale
Breaking: some anonymous hackers claim they nabbed Mitt Romney’s tax returns from a Franklin, TN accounting office and are threatening to release them to the media. God how delicious if this turns out to be true.
Grab the popcorn …
Villago Delenda Est
Freddie, you’re asking people to think things through.
This does not sell in teatard land. They’re too busy hating to take out time to think.
MattF
Right, the missing piece, to be explicit about it, is a model. Statistics will tell you the likelihood that covariation in data is due to chance, but it will not give you a model. The catch is that people who dismiss correlations also dismiss modeling– you can ask Prof. Krugman about that.
Villago Delenda Est
@Southern Beale:
Don’t threaten. DO.
Fuck OvenMitt and his entire fucking family. Roast them all alive.
Chris
@Southern Beale:
The Romney campaign will claim it’s a fake.
different-church-lady
@Chris: …and then claim it’s Obama’s failure for not preventing it.
jl
@MattF:
I agree. I would use the word ‘causal model’. Usually, arguments revolve around rival causal models that generate a data set with a certain pattern of correlations. You need to determine which causal model is most consistent with an observed pattern of correlations.
You need to write down a causal model first, then look at the data, see what causal model is favored by the data. This can be difficult and messy because no one usually has completely correct causal model. Or, you may not be able to observe a variable that is needed to really distinguish between two causal models, so you have to find a proxy variable.
There is an old saying in statistics: no causes in, no causes out. That means, if you do not begin by writing down a causal model before you see the data, then you have no way of determining how well the data match with a given causal model, and you cannot make any conclusions about causes.
Now, you can look at the correlations first and make up causal stories from the observed correlations. But then you just end up arguing forever, and none of your statistical tests (that is, the magic 5 percent significance level) are valid.
No causes in, no causes out.
Shawn in ShowMe
@Southern Beale:
Unless they release the tax returns to Amy Goodman, this only ensures the contents will be buried forever. That horse race ain’t gonna promote itself.
Chris
@different-church-lady:
Correct!
catclub
@MikeJ: The race is not always to the swift, nor the battle to the strong, … but that’s the way the smart money bets.
Freddie deBoer
Meant to add: correlational data is often the first step in understanding important phenomena before causes can be explained. The correlation between smoking and lung cancer, for example, and between diet and heart disease, were each recognized before causal mechanisms had been observed.
ThatLeftTurnInABQ
@MikeJ:
__
In other words based on our empirical experience, correlation and causation have a demonstrated tendency to be, oh how does one put these things, hmm, I’m looking for a word here, oh yes: correlated with each other.
weaselone
I know your liberal sensitivities won’t let you admit it, but it is quite obvious that the free lunch is causing the poor test performance. If these children were a little hungrier and not demoralized by their dependency on the state, they would try harder on their tests and put more effort into their studies so as to secure a future free of starvation. Also, they would sell their $500 dollar Nikes and start hedge funds and management consultant businesses.
Southern Beale
And an update on the rumored hacking of Romney’s tax files: apparently the Williamson County GOP received a flash drive with the alleged files.
different-church-lady
When smart laypeople use the expression “correlation is not causation” I think they’re using shorthand to say, “Hey, you’re comparing two different outcomes and attempting to say that one has lead to the other.” In other words, they’re confusing a result with a cause.
When scientists use it, it’s beyond my pay grade. When dumb laypeople use it, I have no way of comprehending what they’re thinking.
ThatLeftTurnInABQ
@weaselone:
__
Don’t forget possible positive second-order effects. For example if the students are hungry enough, I mean really really hungry, then the smarter and stronger ones will kill and eat the slower and stupider students, thus raising the median score for their group as a whole by cutting off the low-value tail of the distribution.
Southern Beale
I’m thinking the Romney tax return thing is a hoax but as the last link I posted noted, the hacking of Sarah Palin’s e-mail account came out of Tennessee, too. So who knows.
Just seems weird. Maybe it’s an effort to get his tax returns back in the news? That would be a good thing, IMHO.
daverave
@Southern Beale:
…and the reporter’s name is Whitehouse? That’s a bit of a hoax red flag, amirite?
MikeJ
@Southern Beale: God that’s fucking moronic. Turn a goddamned robber baron into a victim.
Downpuppy
To answer a question with a question is good, no?
“But what’s the cause of the correlation?”
Ben Franklin
@different-church-lady:
This leak must be stopped. Whistleblowers must be stopped.
eyelessgame
Take the “correlation is not causation” ideogram to its extreme and science becomes impossible. Correlation suggests causation.
Southern Beale
Secret Service has confiscated the flash drive sent to Williamson County GOP….
Brachiator
@Freddie deBoer:
bullshit. Correlation can as often be a dead end as a first step in understanding important phenomena.
The correlation between shit on a pump handle and cholera was seen, misunderstood and denied.
The apparent correlation between stress and ulcers turned out to hide some more interesting physical causes.
Are you going to post something next on Phrenology and the correlation between head shape and criminality?
Ben Franklin
@Brachiator:
Phrenology is a much misunderestimated science.
Jim Kakalios
It does not imply causation, but it does not rule it out either. And if
you are going to find an underlying cause, the first place to look is patterns of phenomena.
Greg is very smart and an excellent writer (I’ve known him personally for years). Glad to see him showing up at BJ.
RP
“Correlation is not causation” is the phrase most commonly used, but it’s misleading. A more accurate version is “correlation does not prove causation.” IOW, correlation by itself is not sufficient evidence for causation. But correlation certainly suggests causation.
Shinobi
I think there are smart, scientific, useful ways to look at correlations. They can point you in the right directions, for sure, and modeling is really all about finding relationships that aren’t necessarily causal, but still exsist.
I think the danger of correlations comes when we have situations where the news media and the health police are telling people how to live their lives based on a correlation between one thing and another. I have a friend who told me I should NEVER SLEEP WITH A LIGHT ON because it will make me near sighted because of a debunked study based on correlations that failed to take into account that near sighted parents are more likely to use a nightlight and to have nearsighted children.
News media and politicians take correlative health information as fact, they treat risk factors as certainties. One only needs to look at the moral panic surrounding obesity to see just where overstating the power of correlations can lead. Children are being shamed into having eating disorders and hating fat people at a very young age now. And people are talking about not hiring fat people or charging them more for health insurance, all because of some correlations that in this particular case are very prone to being confounded.
J
http://xkcd.com/552/
Brachiator
@RP:
Nope. Not even that.
And I suppose one of the main things here, separate from any of Freddie’s points, is that too often lay people (and even some supposed experts) simply stop at correlation and either evoke some conventional or folk wisdom that “establishes” the cause, or simply stop thinking or investigating a subject and go “case closed.”
The media often feeds the beast by printing a story that mentions the initial correlation, but then get bored and barely mention the follow up that totally demolishes the earlier report, correlation and all.
Belafon (formerly anonevent)
@Brachiator: Yes, and there was that major fail of the correlation between black body radiation and frequency. Look how studying that fucked up the last 100 years of science.
jl
@Brachiator:
” The correlation between smoking and lung cancer, for example, and between diet and heart disease, were each recognized before causal mechanisms had been observed. ”
Glad you spotted that. I missed it. Actually, there were obvious causal theories for the rise in lung cancer. Most of them based on air pollution that contained carcinogens. Docs in the UK had a list of things, and increased industrial pollution and local air pollution from traffic were at the top of the list. Cigarette smoking, somewhat further down. Turned out that the causal mechanism most consistent with the observed correlations pointed to cigarette smoking.
Maybe the observed correlation between snuff and pipe smoking and head and neck cancers, which goes back to the 18th century is an example of a relationship suggested by correlation with no clear causal mechanism.
Moving from observed correlations to a specific causal mechanism is always a suggestive and exploratory analysis. You need to try out the causal theory new data for a formal statistical analysis. For confirmation and formal evaluation of evidence, you have to have the causal mechanism written out and pattern of correlations predicted before you look at the data.
jl
May be a problem with semantics here. When Laden says that a high observed correlation always suggests some causal relationship, in some broad sense that is probably true. I might go with that.
Problem is that there are an a large number of causal relationships that can give rise to the same observed correlation. So, broadly, I guess you can say that a systematic pattern that is very unlikely to have been produced by chance points to come causal relationship underlying the systematic effect.
But that is such a broad statement it doesn’t mean much, and almost never answers the question most people are interested in: what specific causal relationship produced the observed correlations? Most policy debates are about which of two or more rival causal relationships produced the data, not the existence of some causal relationship that produced a systematic pattern.
Brachiator
@jl:
Hell, chimney sweeps and testicular cancer.
RE: policy debates. There was a NY Times blog post a while back wondering why there were not more scientists, or science literate people, in elected office. I think this came up again recently on the British program “More or Less: Behind the Stats.” Ah, the NY Times blog made this sobering observation:
Sadly, there does appear to be a correlation between willful ignorance and bad policy.
Allen
Wow, this Greg Laden guy is an idiot. I don’t think I’ve read something this dumb that was science related since reading a creationist’s blog (but the creationist’s blog wasn’t science related; oh wait, neither was Laden’s column).
Much like the creationist spouting, “It’s just a theory,” Laden completely misinterprets what “imply” means to scientists. To put it in laymen’s terms, it would be more precise to say that correlation does not prove causation, but when that is pointed out to Laden in the comments, all he can do is spout gibberish.
It sounds like some pet theory of Laden’s–almost certainly an incorrect one if his column is any indication of his “thinking” ability–was pooh-poohed, and now he has hurt fee-fees.
jimmiraybob
Try “correlation does not necessarily imply causation!”
Fixed.
MattR
@Allen: You know it is garbage when he responds to a comment about the stock market always going up on someone’s brithday by arguing that there is a causal relationship between the two – “Time goes in one direction.” If we want to get that abstract then you can define a causal relationship between any two events by saying “The universe exists” (Of course both of those bastardize the definition of causal relationship.)
burnspbesq
@eyelessgame:
Correlation suggests causation.
Fair enough. I’ve said “correlation is not causation” often enough myself, as shorthand for “slow down there, bunky, there’s more than one potential cause for the effect you’re observing — show me a high r-squared for the variable you’re claiming is the cause, and then we can talk.”
Both Sides Do It
I was all set to get my hate on but the crazies Laden and deBoer are arguing against are too far out there.
In a theoretical research sense of course correlation does not equal causation, and of course that’s something that needs to be in the forefront of the minds of everyone who conducts or reads a study utilizing correlation techniques.
But even more so, in a public policy setting where things have to get done and decisions need to get made, in other words were making no decision is itself a decision, you go with what you’ve got. And something with as high a correlation as school lunches and test scores with such a clear common-sense link is of course actionable intelligence.
I seriously have trouble believing there are people who think otherwise, but of course there are. Ugh.
Nice piece Freddie.
El Cid
I’ve mostly encountered “correlation is not causation” in the context of “yes, you’ve just made a shitty ridiculous argument and tied two or more sets of data points together, but it doesn’t mean you’ve yet shown that one seems to cause the other, and the ridiculousness of your starting point suggests to me that no such thing is happening.”
mclaren
Correlation very often does imply causation. In fact, in everyday life, almost always.
Brachiator
@mclaren:
No, it doesn’t, but thanks for playing.
Or more accurately, “implication” is not meaningful. Not even in everyday life.
Bill Murray
@burnspbesq:
probably you want more than one data set with a high r-squared for the test of the variable. One common bad science tactic is to make a large number of correlations from a data set and then tout the ones that pass the p-test as being proved. If you test 20 things and use a p-value of 0.05, something like 2/3 of the time at least 1 variable will pass.
edited for bad English, in theory it’s better now.