Chapter 2: PROBABILITY AND STATISTICS IN RECORD LINKAGE



Mathematics provides many tools with which it is possible to model the principles and processes involved in record linkage. Basic among these tools are those found in the studies of probability and statistics. The principles of statistics tell us to analyze the problem so as to identify the variables and set up appropriate measures to quantify them. The principles of probability help us clarify and define the interrelationships between these measures. The goal is to state the relationships between these quantities mathematically and in the most effective and precise way possible.

Explaining Matches by Measuring Comparison Attributes.   Statisticians require a clear distinction between the variables that they are trying to explain and those that are measured in the attempt. In probabilistic record linkage we concentrate on three parameters of an identifying field which we measure to get at the explanatory variable, which is whether the records compared represent the same individual, and we use statistical probability to do so.

Fundamental Theorem of Probabilistic Record Linkage


  1. Explanatory variables
  2. Response variables
  3. Measuring agreement and disagreement
  4. Various strengths of data
  5. Availability of data
  6. Relating the three attributes of a comparison

Using Probability to Measure Three Key Variables.   The previous section outlined rough definitions for three logical attributes of a record comparison as they concern our study: 1) matched vs. unmatched records, ¶ 2-1.1, 2) agreement vs.disagreement in the data values in the fields of the records, ¶ 2-1.3, & ¶ 2-1.4, and 3) presence vs. absence of data in the fields of the records, ¶ 2-1.5. Our final goal in this section is to define comparisons so as get a handle on the first attribute — to distinguish those that are matched from those that are unmatched. To do this accurately we must study the necessary relationships between these three attributes and obtain measures of the other two attributes — agreement/disagreement and presence/absence.


  1. Probability
  2. Two basic relationships between attributes
  3. Example of conditional probability
  4. Calculating conditional probabilities
  5. Independence of events
  6. The theorem of total probability
  7. Bayes’ theorem
  8. The strategy of probabilistic record linkage