Here we totaled the cases of agreement in every possible comparison in all duplicate groups. We might call this the comparison reliability,
P(Ai |MC), i.e., the probability that the field data values agree in any matched comparison taken at random.
P(Ai | MC) = Atotal ÷ Ctotal | (3.3) |
When duplicate groups are large, equation 3.3 will give a bias to comparisons in those groups. It may be advisable, therefore, to take a weighted average of all
duplicate groups with each linkage entity being equally important. This suggests a second measure of reliability that we could call the entity reliability,
P(Ai |ME), i.e., the probability that the data in two corresponding fields belonging to all comparisons
involving some randomly chosen linkage entity agree.
In this case we sum the relative frequency of agreement (Aij) among comparisons in the duplicate group representing
each linkage entity (Cij), and divide it by the total number of groups (Gtotal).
| (3.4) |
The number of comparisons depends on the size of the duplicate group (Nj). It is the number
of combinations taken two at a time.
Cij = [Nj × (Nj – 1)] ÷ 2 | (3.5) |
Typically these two measures of reliability are very nearly the same. Only as there are greater numbers of larger duplicate
groups would we expect them to diverge.