Friday, May 28, 2010

Expressing the problem using a population distribution rather than a probability distribution has an additional advantage: it forces us to be explicit about the data-generating process.

Consider the disease-test example. The key assumption is that everybody (or, equivalently, a random sample of people) are tested. Or, to put it another way, we're assuming that the 10% base rate applies to the population of people who get tested. If, for example, you get tested only if you think it's likely you have the disease, then the above simplified model won't work.

This condition is a bit hidden in the probability model, but it jumps out (at least, to me) in the "population distribution" formulation. The key phrases above: "Of the 10 with the disease . . . Of the 90 without the disease . . . " We're explicitly assuming that all 100 people will get tested.

Andrew Gelman
on assumptions underlying calculations of conditional probability.


William said...

Hi -- I am wondering whether you've ever looked at my friend the late Dick Jeffrey's book on Subjective Probablity? Also his classic book on The Logic of Decision?

Helen DeWitt said...

No, I never have - but thanks for the tip.