Chapter 6: CALCULATING OUTCOME PROBABILITIES



We will first derive the equations needed to calculate the probability of a weight being matched and of being unmatched with a review of the adjustments required. Probability calculations must take account of presence dependence. Each comparison weight comes from a particular combination of field weights — each field being in agreement, in disagreement, or missing. We finally consider the number of calculations that are required.

Probabilities for a comparison weight.   There are two probabilities of a particular comparison weight occurring. The first is when the comparison is between matched records and the second is when it is between unmatched records. In order to make these probabilities independent of whether the comparison weight is matched or unmatched we need to multiply by the probability that it falls into one or the other of these categories.



Calculations for death date example.   It will be instructive to take our example of fields in the death date and calculate the various weights and their probabilities of occurrence. This set of fields is not entirely typical, since their low presence values will result in very small probabilities. However, if we were to use the birth data, we would not be able to illustrate all the combinations because the birth year is always present. We will be adjusting presence, but using reliability and coincidence as measured. Value dependency involves a rather straightforward adjustment to these values, which we will cover in §6-3.



Adjustments for value dependence.   The calculations of the last section do not illustrate the effects of value dependence. Certain dependence chains involve adjustments in both blocking and weighting statistics since both reliability and coincidence are affected. To illustrate we proceed as we did in ¶ 6-2.1 when we calculated the preliminary values in table 1. Just as with the presence dependence adjustments, in a dependence chain of three fields, there are four kinds of comparisons that make sense. If the three fields are blocking fields, only one combination applies: they must all be present and agree. When some are blocking and others are weighting fields, the combinations apply which are present and agree in blocking so long as they are present in weighting.