Suppose we have a sample of records and that 0.5% of all the record comparisons we can possibly make are matched M. This is a way of looking at the duplication rate. Suppose further that we choose agreement in a particular field as the response variable and that in observing the values in the matched record pairs, we find that some of them, say 2% of the pairs actually disagree in that field. We say that the reliability of the field is 98%. Moreover, suppose that in looking at all the comparisons we note that among those that agree 3% are not matched pairs; the data values in the field agree by coincidence. What if in a particular pair the field agrees? To what degree is it safe to conclude that the comparison is matched? Assigning the probabilities to the branches in figure 4 we find that the total probability of agreement is:
P(Ai) = #1 + #3 = (0.005) × (0.98) + (0.995) × (0.03) = 0.035
P(M|Ai) = #1 ÷ (#1 + #3) = (0.005) × (0.98) ÷ 0.035 = 0.14