Name Group Name Code Name Variant English Version Spelling Derivation Deviant Usage Character Set Status Culture Origin Name Code Category Gender Frequency Similarity Function Similarity Code

Section 2: NAME GROUPING

Using the name administration system the name expert may apply any of several ways to come up with the name group coding (standard or authority) required. Within this system the distinct name spellings become the entities represented by name records to be linked, i.e., declared to be different ways of designating the same name. Each method involves the record linkage of name pieces. In a manual method an expert compares two similar name spellings, adjudges them to be the same or different. Those that qualify become variants in the same duplicate name group.

Just as individual record linkage has been automated, so the process of name record linkage might also. Successful individual record linkage may be taken as an iterative process whose results are successive refinements to the reliability of the data weighted. Central to the success of record linkage for names is the generation of a number of identifiers for the names. There must be a sufficient number to characterize each variant as formally distinct. The record linkage system may require measurements on a seed of known duplicates. It then bases its linkage decision about new comparisons on these measurements. Providing there are appropriate measures, it is conceivable that a seed of known duplicate name records might initiate an iterative process resulting in name grouping. It is important that a set of rules for each dimension of similarity be developed. To automate the process the most effective and appropriate grouping principles have yet to be exploited.

Record linkage for names first of all involves bringing similar pieces together and on that basis assigning them to name groups of various kinds. Certain name pieces typically belong to categories according to how society uses them. These words are typically distributed around the world according to their particular culture (locality and time period). We compare and contrast names at the highest level within a culture in terms of its language. At the next highest level a personal name piece may class typically as a given name or as a surname — usually the name of a family. Two name pieces may belong to the same group even when they are typically found in different categories, e.g., a person may use what is typically a surname as a given name or vice versa. Guided by this philosophy we handle gender-specific names similarly. Whether a personal name is typically given to a male or to a female, this fact may not be considered in assigning it to a name group. Nor need the language be considered important: John, Jean, Johanna, Jane, Ivan, Evan, Sian, Giovanni, etc., may for some purposes all belong in the same group. There are different groups for different purposes. At perhaps the lowest level of analysis, grouping would probably bring together diminutives and nicknames with their corresponding full forms. We also have here 1) the various inflectional forms, e.g., case variations required by the grammar of some languages, 2) abbreviations and 3) common misspellings.

Figure 1 diagrams the elements of a name group as defined for a particular culture. The universal name group is identical, except that the language designation belongs to certain specific elements of the group (shown in the figure with an asterisk) rather than to the whole group.