1.5 Classifier. The standardization process must assign to each token its possible class membership. Each token is an individual meaning carrying portion or combination of name pieces that are found as entries in the LEXICON. There are also situations where multiple tokens together form a phrase which has an entry in the LEXICON. Hence, the classifier first takes the full phrase and analyzes it piecemeal, considering smaller and smaller combinations as it goes.

In the following table we consider a number of possible lexical entries. This will illustrate the various possibilities that may be found in the English and other related cultures and languages.

TokenCategoryClassGenderStandard
“ ”hyphendeviant“-”
“,”commastandard
“-”hyphenstandard
“/”separatorabbreviationor
-sonpatronymicstandard, affixM
akaseparatorstandard, abbreviation
also known asseparatorfull formaka
apparticlestandardM
ArchbishoppositionstandardM
Bachelor of Sciencepostnominal initialsfull formB.S.
Bakeroccupation namestandardM
Bakersurnamestandard
BaronpositionstandardM
Baronsurnamestandard
beatitudeabstractionstandardM
Bishoppositionstandard
Bishopsurnamestandard
Blackepithetstandard
Blacksurnamestandard
BrotherrankstandardM
B. S.postnominal initialsstandard, abbreviation
Buttonsoccupation namestandardM
Buttonssurnamestandard
Canterburydomain namestandardM
CardinalcardinalstandardM
Christiansonsurnamestandard
Cobleroccupation namestandardM
Coblersurnamestandard
Cowperoccupation namestandardM
Cowpersurnamestandard
D. D.postnominal initialsstandard, abbreviation
denobiliarystandard
ditseparatorstandard
Doctorrankstandard
Doctor of Divinitypostnominal initialsfull formD. D.
Doctor of Philosophypostnominal initialsinterpretive form of Philosophiae DoctorPh. D.
Donaldgiven namestandardM
Donaldsurnamestandard
eminenceabstractionstandardM
FatherrankstandardM
fifthordinal phrasefull formMV
geb.separatorstandard, abbreviation
genanntseparatorstandard
herpossessivestandardF
hispossessivestandardM
holinessabstractionstandardM
House of Representativesdomain namestandard
Howardsurnamestandard
HowardtitlestandardM
IIordinal phrasestandard, Roman numeral
Johngiven namestandardM
Johnsurnamestandard
Jonessurnamestandard
Juniorcomparative adjectivestandardM
KingpositionstandardM
Kingsurnamestandard
Littleepithetstandard
LongespeeepithetstandardM
LordpositionstandardM
Lordsurnamestandard
Mac-patronymic particlestandard, affix
Macedoniadomain namestandardM
Marygiven namestandardF
MayorepithetstandardM
McDonaldsurnamestandard
MonsignorrankstandardM
mostquantifierstandard
MotherrankstandardF
O’-patronymic particlestandard, affix
ofprepositionstandard
orseparatorstandard
PatriarchpositionstandardM
Peaceepithetstandard
Ph. D.postnominal initialsstandard, abbreviation
Philosophiae Doctorpostnominal initialsfull formMPh. D.
PoperankstandardM
Popesurnamestandard
reverendattributestandardM
rightquantifierstandard
royalqualifierstandard
secondordinal phrasefull formII
Senior comparative adjectivestandardM
S. C.postnominal initialsstandard, abbreviation
SisterrankstandardF
S. J.postnominal initialsstandard, abbreviationM
Smithsurnamestandard
Society of Jesuspostnominal initialsfull formMS. J.
StewarttitlestandardM
Stewartsurnamestandard
Superiorcomparative adjectivestandard
Supreme Courtpostnominal initialsfull formS. C.
thedeterminerstandard
Thomasgiven namestandardM
Thomassurnamestandard
Thomas Jeffersongiven namestandardM
Thos.given nameabbreviationMThomas
T. J.given namenickname, abbreviationMThomas Jefferson
Vordinal phrasestandard, Roman numeralM
v.patronymic particleabbreviationFverch
v.nobiliaryabbreviationvan
vannobiliarystandard
venerableattributestandardM
verchpatronymic particlestandardF
veryquantifierstandard
vulgoseparatorstandardM
Weaveroccupation namestandardM
Weaversurnamestandard
Windsortitlestandard
Windsorsurnamestandard
XtiansonsurnameabbreviationChristianson
Yorksurnamestandard
Yorktitlestandard

There are. of course, many other features (we have shown only gender) that may show up on the lexical entry. In addition to these, it is important in some cases for relative frequency information to be given. The parser requires some means to rank ambiguous structures according to likelihood of occurence. For example, John is far and away more frequent as a given name than as a surname. When record linkage uses the personal name for identification purposes, it is also necessary that there be some indication of relative frequency when it comes to calculating individual record weights.