2-2.7 Bayes' theorem.

Bayes' theorem is a way of allowing conditional probability to look backward. It states that the a posteriori of M at the first stage, i.e., as a hidden probability — an explanatory variable — given Ai on the second stage is P(M | Ai ) = P(Ai and M) P(Ai), which being interpreted according to the diagram in figure 4 is

Suppose we have a sample of records and that 0.5% of all the record comparisons we can possibly make are matched M. This is a way of looking at the duplication rate. Suppose further that we choose agreement in a particular field as the response variable and that in observing the values in the matched record pairs, we find that some of them, say 2% of the pairs actually disagree in that field. We say that the reliability of the field is 98%. Moreover, suppose that in looking at all the comparisons we note that among those that agree 3% are not matched pairs; the data values in the field agree by coincidence. What if in a particular pair the field agrees? To what degree is it safe to conclude that the comparison is matched? Assigning the probabilities to the branches in figure 4 we find that the total probability of agreement is:

P(Ai) = #1 + #3 = (0.005) × (0.98) + (0.995) × (0.03) = 0.035

By Bayes' theorem the (hidden) probability that the pair is also matched is:

P(M|Ai) = #1 ÷ (#1 + #3) = (0.005) × (0.98) ÷ 0.035 = 0.14

So even though the test for agreement on this one field may seem fairly discriminating of matched pairs, with 98% (reliability) and 97% (non-coincidental) success rates, if the field agrees in a particular comparison, we can only be 14% sure that the comparison is a match.

Bayes' Theorem