3-2.1 Two definitions for duplication rate.
There are at least two ways to look at
duplication rate. One way is the probability that when you choose a record at random, there will be
at least one other record that represents the same entity in the file. We might call this the redundancy rate. In this case we divide the number
of records in groups (NG) less the number of groups (G) by the total number of records in the file (Ntotal) as in equation 3.6:
P(RN) = (NG G) ÷ Ntotal | (3.6) |
A second way is the probability that when you choose an entity at random, there will more than one
record representing it in the file. In this case we divide the number of duplicate groups in the file
by the number of unique linkage entities represented.