Things that screw up your causal inference

Selection bias. Probably most self-made billionaires took a lot of crazy-seeming risks to get there. Does that mean that taking crazy-seeming risks is the best way to make money? No, it just has higher variance. The distribution of wealth for “people who took crazy risks” doesn’t have a higher mean—there are just more people at both ends, and you only looked at one end.
Solution: make inferences only about the population you sampled from; sample only from the population you want to make inferences about. If that’s too hard, there are fancier ways of compensating, which for instance Andrew Gelman has studied, though of course they require more assumptions and are less robust. (In the linked article: inferring US election outcomes from polling data of Xbox owners, yes, you read that right.)
Omitted variable bias. If you look at Berkeley’s graduate school acceptance rate by gender, it looks like they’re sexist: their admission rate is much lower for women than men. But if you control for department, the situation reverses: almost every department admits more women than men. It just happens that women tend to apply more to the most competitive departments with the lowest overall acceptance rates, which drags down the total acceptance rate for women.
Solution: control for all common causes or intermediate causes between the treatment and the response variable. For instance, the applicant’s gender causally affects which department they apply to, and that causally affects their chance of acceptance, so this indirect effect needs to be teased out from the direct effect.
Residual confounding. Even if you control for the potential confounder, you have to make sure you got the shape of the association correct. For instance, suppose you’re trying to study the effect of income on mortality rates, controlling for age (which is known to increase both income and mortality rate). If you code age as a categorical variable and control for the category (e.g. “less than 30”, “30-50”, “50+”), then within each category there will still be an (unmeasured) “age within category” variable that confounds the income-mortality relationship.
Solution: I’m not aware of any airtight theoretical solution to this problem (for the practical case, where you don’t know the functional form of the confounding relationship). But in practice it will definitely help to use a model that doesn’t assume linearity, such as a generalized additive model. Some simulation evidence suggests that GAMs can be effective in reducing residual confounding.
It’s also worth noting that residual confounding is more likely to create spurious associations than spurious non-associations. So it’s safer to interpret null results (with tight confidence intervals) than positive ones in the presence of possible residual confounding.
Collider bias. Not only do you have to control for all common causes of the potentially associated variables (see: omitted variable bias), you can’t control for any common effects of those variables. This is important and people don’t talk about it enough! For example this random implausible-sounding study (which later failed to replicate in a different population) found that frequent blood donors have about 20% of the heart disease risk of non-donors. Here’s one (among many) plausible explanations for this:
The study controlled for physical activity. That means their result should be interpreted as, “given the same level of physical activity, frequent blood donors have lower risk.” Now, blood donation probably reduces physical activity (because they tell you not to do anything strenuous for a while afterwards), and having a weak heart probably also reduces physical activity. So if you have an average level of physical activity, and donate blood very often, then you must have a really strong heart to bring you back up to an average level of physical activity. Presto! We’ve created an association out of nothing.
Solution: don’t control for any common effects of your treatment and control (or common effects of things associated with your treatment and control).

Related

Absence of evidence and confidence intervals

What happened to all the non-programmers?

Moral arguments

Comments