2-1.6 Relating the three attributes of a comparison.

Figure 2 — Attributes as Sets Figure 2 diagrams the above three categories of record comparisons as sets. The circles are sets of points intended to help fix in the mind the relationships among the categories. In this particular situation the sets are arranged so that there are some comparisons (points) in each category but not in every combination of categories. A comparison may represent the same entity (matched, M) and yet have data missing in one or both records (absent, ¬P). In this case the comparison would be represented by a point inside the circle marked M, but not also inside P. When a field's data is missing in one or both records, the field will not be able to contribute to the decision about the record being matched. In the diagram there are points that are not contained in A, i.e., representing non-agreement, ¬A, that are also not in P. The points outside A that concern us most represent disagreement. These points must also be inside P, i.e., the intersection of P with ¬A. If the data is missing, i.e., not present, ¬P, it can neither agree nor disagree. So non-agreement is not the same as disagreement. When one or both of the fields are blank we cannot compare the data. We will have to be careful to define agreement and disagreement only for records where the data is present. Hence, we are defining A as a subcategory or subset of P. Only part of the set ¬A is the set representing disagreement, namely P with ¬A.

Probabilistic Record Linkage Principle of Intersecting Sets