3-2.3 Calculating an entity duplication rate.

Here is how to calculate the entity duplication rate from numbers that might easily be derived from the data. One number we need to determine is how many individuals all the records in the file together represent (NU). We call the number of individuals (linkage entities) the uniquely significant records and what we mean by this is that these are the fewest number of records needed to represent all the individuals just once each. To get this figure we subtract the redundant records (NR) from the total (Ntotal).

unique records (NU) = NtotalNR(3.8)

The redundant records are all in duplicate groups. We can get this number by taking one from the number in each of the duplicate groups and adding the remainders together, i.e., we subtract the number of groups (G) from the number of records in groups (NG).

redundant records (NR) = NGG(3.9)

Probabilistic Record Linkage Principle of Duplication Rate

With these figures in the equations it is possible to find the duplication rate of equation 3.7 by dividing the number of duplicate groups (G) by the number of uniquely significant (non-redundant) records (NU, equation 3.8). Table 1 below shows some of the numbers for a test database from Akershus, Norway.

Years in Sample Total Unique Duplicates Total Duplication Rate
Ntotal NU Pairs Triples Quadruples NG P(DE) P(RN)
1736-1755 10849 10227 563 (557) 25 (32) 3 (2) 1213 0.0578 0.0573
1781-1794 9772 9465 270 (279) 17 (9) 1 (0) 595 0.0304 0.0314
1805-1814 6465 6458 151 (154) 7 (4) 323 0.0245 0.0255
1836-1845 11249 11088 141 (149) 10 ( 2) 312 0.0136 0.0143
1866-1875 7198 7062 126 (128) 5 (3) 267 0.0185 0.0189
Total 45533 44142 1251 (1279) 64 (39) 4 (1) 2710 0.0299 0.0305

Table 1 — Sample Duplication