2.1 Standardization.  Below is an illustration of how the query set is reduced through standardization. The data comes from a research problem in Norway. The assignment of a standard to the various spellings of a given name reduces multiple queries on that name to one. Similarly multiple queries on the patronymic, which was used historically as a surname, may be reduced to one. So if both the given name and the patronymic are needed to identify a person, and the person making the query doesn’t know how the name will be found in the record, there are many possible queries. Some of the possible ways to record a certain man’s name appear in the following chart. Here there are n possibilities for the given name: GN1, …, GNi, …, GNn. Similarly there are m possibilities for the surname: SN1, …, SNj, …, SNm. This means there are n * m total possibilities to form a query. Standardization reduces this to one.

ClassGN1
Siver
GN2
Syver
GN3
Sivert
GN4
Syvert
GN5
Sigvart
GN6
Sigvaart
GN7
Sirverdt
GN8
Syrverdt
GN9
Sigurd
GN10
Siur
GN11
Siul
SN1
Ole
Siver OlesenSyver OlesenSivert OlesenSyvert OlesenSigvart OlesenSigvaart OlesenSirverdt OlesenSyrverdt OlesenSigurd OlesenSiur OlesenSiul Olesen
SN2
Ola
Siver OlasenSyver OlasenSivert OlasenSyvert OlasenSigvart OlasenSigvaart OlasenSirverdt OlasenSyrverdt OlasenSigurd OlasenSiur OlasenSiul Olasen
SN3
Oluf
Siver OlufsenSyver OlufsenSivert OlufsenSyvert OlufsenSigvart OlufsenSigvaart OlufsenSirverdt OlufsenSyrverdt OlufsenSigurd OlufsenSiur OlufsenSiul Olufsen
SN4
Olav
Siver OlavsenSyver OlavsenSivert OlavsenSyvert OlavsenSigvart OlavsenSigvaart OlavsenSirverdt OlavsenSyrverdt OlavsenSigurd OlavsenSiur OlavsenSiul Olavsen
SN5
Olof
Siver OlofsenSyver OlofsenSivert OlofsenSyvert OlofsenSigvart OlofsenSigvaart OlofsenSirverdt OlofsenSyrverdt OlofsenSigurd OlofsenSiur OlofsenSiul Olofsen
SN6
Olaf
Siver OlafsenSyver OlafsenSivert OlafsenSyvert OlafsenSigvart OlafsenSigvaart OlafsenSirverdt OlafsenSyrverdt OlafsenSigurd OlafsenSiur OlafsenSiul Olafsen
SN7
Olle
Siver OllesenSyver OllesenSivert OllesenSyvert OllesenSigvart OllesenSigvaart OllesenSirverdt OllesenSyrverdt OllesenSigurd OllesenSiur OllesenSiul Ollesen
SN8
Olla
Siver OllasenSyver OllasenSivert OllasenSyvert OllasenSigvart OllasenSigvaart OllasenSirverdt OllasenSyrverdt OllasenSigurd OllasenSiur OllasenSiul Ollasen
SN9
Ollof
Siver OllofsenSyver OllofsenSivert OllofsenSyvert OllofsenSigvart OllofsenSigvaart OllofsenSirverdt OllofsenSyrverdt OllofsenSigurd OllofsenSiur OllofsenSiul Ollofsen
SN10
Ollov
Siver OllovsenSyver OllovsenSivert OllovsenSyvert OllovsenSigvart OllovsenSigvaart OllovsenSirverdt OllovsenSyrverdt OllovsenSigurd OllovsenSiur OllovsenSiul Ollovsen
SN11
Ollav
Siver OllavsenSyver OllavsenSivert OllavsenSyvert OllavsenSigvart OllavsenSigvaart OllavsenSirverdt OllavsenSyrverdt OllavsenSigurd OllavsenSiur OllavsenSiul Ollavsen
SN12
Ollaus
Siver OllaussenSyver OllaussenSivert OllaussenSyvert OllaussenSigvart OllaussenSigvaart OllaussenSirverdt OllaussenSyrverdt OllaussenSigurd OllaussenSiur OllaussenSiul Ollaussen
SN13
Ollaug
Siver OllaugsenSyver OllaugsenSivert OllaugsenSyvert OllaugsenSigvart OllaugsenSigvaart OllaugsenSirverdt OllaugsenSyrverdt OllaugsenSigurd OllaugsenSiur OllaugsenSiul Ollaugsen
SN14
Olaus
Siver OlaussenSyver OlaussenSivert OlaussenSyvert OlaussenSigvart OlaussenSigvaart OlaussenSirverdt OlaussenSyrverdt OlaussenSigurd OlaussenSiur OlaussenSiul Olaussen

It is easy to see that the problem is much larger even than this example, when all the variant spellings are taken into consideration. The surname “Taylor” in the British Isles has more than 150 spellings alone. The place called “Wuerttemberg” has been found referred to in any of over 2500 different ways of being spelled. The given name “Elizabeth” in all its variations numbers over 4,000. Such large n values imply a very cumbersome query management.

In actual practice the form of the authoritative standard is irrelevant. Hence, if the query comes in for “John Tailor” of “Wurtemberg,” the standard forms may be “Johannes Schneider” and “Wuerttemberg.” But, the standard might also be some set of codes, such as, “00003487” for the given name, “00236349” for the surname and “007830” for the place.