3-2.1 Two definitions for duplication rate.

There are at least two ways to look at duplication rate. One way is the probability that when you choose a record at random, there will be at least one other record that represents the same entity in the file. We might call this the redundancy rate. In this case we divide the number of records in groups (NG) less the number of groups (G) by the total number of records in the file (Ntotal) as in equation 3.6:

P(RN) = (NGG) ÷ Ntotal(3.6)

A second way is the probability that when you choose an entity at random, there will more than one record representing it in the file. In this case we divide the number of duplicate groups in the file by the number of unique linkage entities represented.

P(DE) = G ÷ NU(3.7)