2-1.2 Response variables.

Whenever we consider two records relating to the same kind of entity and take measurements on their similarity, etc., we can say only that the pair is either linked or unlinked. The variable that has this value is a so-called response variable. If the value of this linked-or-not response variable is true, then with a certain probability this value is explained by the fact that the value of the matched-or-not explanatory variable is also true. Suppose that the system determines that the probability of a particular linked pair being matched is less than the probability of its being unmatched. Then the pair was probably linked in error. Similarly the system may classify some pairs as unlinked when they are in fact matched. In this case it has missed linking the matched pair. Depending on the application, one may specify the acceptable probabilities. Then by continually adjusting and improving the measures on the records, duplicate detection may classify accurately to any desired degree of certainty.