2-1.2 Response variables.
Whenever we consider two records relating to the same kind of
entity and take measurements on their similarity, etc., we can say only that the pair is either linked
or unlinked. The variable that has this value is a so-called response variable. If the value of this
linked-or-not response variable is true, then with a certain probability this value is explained by the
fact that the value of the matched-or-not explanatory variable is also true. Suppose that the system
determines that the probability of a particular linked pair being matched is less than the probability
of its being unmatched. Then the pair was probably linked in error. Similarly the system may
classify some pairs as unlinked when they are in fact matched. In this case it has missed linking the
matched pair. Depending on the application, one may specify the acceptable probabilities. Then by
continually adjusting and improving the measures on the records, duplicate detection may classify
accurately to any desired degree of certainty.