Pearl Causality Thoughts
I recently came upon a 2001 paper by causality researcher Judea Pearl, in which he gives his views on the distinction between “Bayesian” and “causal” models of reality. Surprisingly I disagreed with many things in the paper; unlike Pearl, I still think that Bayesianism is a basically useful and accurate way to create predictive models.
Pearl’s issue with the exclusive use of probabilistic models is simple:
To illustrate, the syntax of probability calculus does not permit us to express the simple fact that “symptoms do not cause diseases”, let alone draw mathematical conclusions from such facts. All we can say is that two events are dependent—meaning that if we ﬁnd one, we can expect to encounter the other, but we cannot distinguish statistical dependence, quantiﬁed by the conditional probability P (disease | symptom) from causal dependence, for which we have no expression in standard probability calculus.
In other words, correlation between jointly distributed variables only establishes that some causal relationship exists, without telling us anything about the direction or nature of causation. This is insufficient to draw any significant inferences.
What I believe Pearl ignores, though, is that causal inferences are generally based on time differences – and with the inclusion of time, causation becomes perfectly expressible in terms of probability!
Consider the statement P(symptom | disease) > P(symptom), backed by evidence. Can we conclude from this that diseases cause symptoms? According to Pearl no, because P(disease | symptom) > P(disease) as well. Lacking any additional information, we can’t draw a causal arrow.
But in fact we do have more information, namely that diseases always precede symptoms, but symptoms never precede disease! Then the real syllogism is P(symptoms will increase in the future | disease was observed in the past) > P(symptoms will increase in the future). This is perfectly cogent, but P(disease was observed in the past | symptoms will increase in the future) > P(disease was observed in the past) is a meaningless statement, because it assumes knowledge of the future!
In fact, Pearl uses essentially this same principle to draw his own conclusions. His definition of causality:
Given these two mathematical objects, the deﬁnition of “cause” is clear and crisp; variable X is a probabilistic-cause of variable Y if P(y | do(x)) != P(y) for some values x and y.
(Where do(x) stands for some mechanical intervention in X).
But of course the only reason he can conclude this is “causal” behavior is because interventions in X always precede changes in Y – if the reverse were known to happen, then in fact nobody would conclude that X caused Y at all!
Overall I think Pearl’s view of what probability theory can express is far too parochial. Yes, in cases like sampling of traits from a large population, it may not be possible to show the influence of time, and hence causality. But that’s not because probability theory can’t deal with time – it’s just because such sampling deliberately ignores time data! In general, statements involving time are perfectly meaningful within the framework of probability theory.