Typographic misspelling

5.1 Typographic misspelling. (Q) Typographical error results in a spelling based on any of the other forms of a name and is due to faulty placement of the fingers when transcribing the letters using a QWERTY keyboard. A study of Danish patronymic affixes appearing in transcribed data has revealed about a half-dozen kinds of errors that seemed best described as the results of errors made at the keyboard. A careful analysis of these errors finds them falling into three grand classes with several subclasses of each, so that there are 12 main kinds of keyboard input error.

The most frequently detected typographical error occurs in only one letter, so that most of the misspelled name remains recognizable. These errors are prone to happen to a typist who uses the hunt-and-peck method. Knowing where the letters are located on the keyboard allows the typist to hit the desired key with the index finger of the nearest available hand. The problem arises when the typist is not careful and the fingers do not strike the key directly. Some of these are commonly called “fat-finger” errors. Those where the struck key is in the wrong row might be called “stiff-finger” errors. The computer allows that holding a key down too long will result in double-hits. This could be called “heavy-finger” error, but can happen with touch typists as well. The so-called “light-finger”' error is also indiscriminate of the typist’s method simply resulting in the omission of a letter. Figure 53 illustrates six typographical misspellings using examples of minimal string transformations affecting the rendering of a single letter of the name string.

The transformations exemplified in figure 54 show five additional typographical misspellings which can occur with a typist using the touch method. The fingers of the hand work in concert, so that when one finger hits the wrong key, it is probable that all the other fingers of the same hand will also. The errors that result are usually regarded as garbage since so many letters may be affected. The illustrations use name strings that require but a single hand to type and so accentuate this effect, but this is not a necessary feature of these errors. The errors on the right are mental lapses that can occur whatever the typist’s method, and the last example duplicates the effect of the so-called "light-finger" error of figure 53.

The algorithm that compares two name spellings may detect possible typographical errors by accessing an association of each letter with the adjacent keys of a QWERTY layout in one of the eleven ways. The number of possiblities for even short names becomes very great and comparing every pair may be very costly. The key to finding comparisons with one of them having a typographical error is the fact that this misspelling tends to be a rare occurrence. Only when the misspelling results serendipidously in the correct spelling of another name would this error be likely to go unobserved. For example, the type 2 misspelling of Jon, a nickname of Jonathan, as Job strikes one as the correct spelling of a different name.

The importantance of relative frequency can be seen in the case when the resultant spelling is unique, i.e. the spelling does not occur multiple times. Suppose, the above example of type 9 error where Fred appears as Dews is observed. This latter spelling has no other occurrences, only the one instance has ever been seen. The score assigned to this comparison must take into account the drastic difference between the relative frequencies and particularly that the observed spelling is unique.

The principle is that a unique spelling ought to be reinterpreted as a variant of its nearest more common spelling. This does not mean, necessarily, that all such errors are unique, since an error can be easily duplicated when copy devices are used. For example, the family name may be mistyped, after which the input device allows the entry clerk to duplicate the same name for other persons in a family listing. However, depending on the cost, less than unique spellings could be considered, provided they cannot be grouped by any other more likely methods.

It seems that the table of letter associations could be used in conjunction with an existing edit-distance algorithm that finds comparisons of near agreements, ones containing only one or a few letter differences. Any comparisons identified as one of the eleven kinds of error could be counted as Q-matching. The uncertainty or distance of the error from the accepted spelling to which it is compared would be proportional to the likelihood that the comparison might be explained on some other dimension of analysis. In most cases, this would be very small.

It also seems that the safest strategy would enforce that such misspellings as are included in name groups should be marked as such. In this way the Q-distance between such groups as Job and Jonathan (including Jon) could contribute to a measure of their confusibility. These two groups would be tangent at that one spelling. So long as tangential spellings are detected as rare occurrences the distances measured on other dimensions and between other members of the group could still remain relatively large.

Works of Wonder | Genealogy