Measuring Reliability

3-1.3 Measuring reliability.

The measure of reliability is simple provided there is a set of matched pairs to measure against.

(3.2)

Here we totaled the cases of agreement in every possible comparison in all duplicate groups. We might call this the comparison reliability, P(A_i |M_C), i.e., the probability that the field data values agree in any matched comparison taken at random.

P(A_i | M_C) = A_total ÷ C_total (3.3)

When duplicate groups are large, equation 3.3 will give a bias to comparisons in those groups. It may be advisable, therefore, to take a weighted average of all duplicate groups with each linkage entity being equally important. This suggests a second measure of reliability that we could call the entity reliability, P(A_i |M_E), i.e., the probability that the data in two corresponding fields belonging to all comparisons involving some randomly chosen linkage entity agree.

Probabilistic Record Linkage Principle of Field Reliability

In this case we sum the relative frequency of agreement (A_ij) among comparisons in the duplicate group representing each linkage entity (C_ij), and divide it by the total number of groups (G_total).

(3.4)

The number of comparisons depends on the size of the duplicate group (N_j). It is the number of combinations taken two at a time.

C_ij = [N_j × (N_j – 1)] ÷ 2 (3.5)

Typically these two measures of reliability are very nearly the same. Only as there are greater numbers of larger duplicate groups would we expect them to diverge.

Works of Wonder | Science of Genealogy