Symbolizing the Genealogical Research Process

Symbolizing the genealogical research process.

Although there remain many areas yet unclear in the proper formalization of the structure of a genealogical source, the direction we go to achieve a full characterization depends critically on the way we choose to symbolize other steps in the research process. We demonstrate the relationship of one source to another in this process. Here is one direction we might take in symbolizing the processes of analysis and interpretation of a source.

In analyzing his sources the genealogist idealizes its structure to correspond with reality as he has come to understand it. He abstracts from this “reality” the elements which he takes as genealogically significant. He may display such elements on an ancestral chart or on a set of family group sheets. Let’s investigate the structure of these forms and compare their parts with those of our sources.

When we compare structures, we will need to say that one structure, S₁, is equivalent to another structure, S₂. This is the case only if every structure, S₃, contained in S₁ is also contained in S₂, and if every structure, S₄, contained in S₂ is also contained in S₁, i.e., S₁ = S₂. S₁ is contained in S₂ provided S₁ has the form:

and S₂ has the form:

If S₁ is contained in S₂, we will state it as: S₂(S₁). When we speakof the component of structure C_i as a structure, we will mean C_i together with all the components that define it, and all the components that define them, and so forth. So, if we say C_i = C_j, we mean the structures which branch from them are equivalent as well. Note that the order of the elements of the structures need not be the same, since equivalence is defined in terms of containment alone.

The genealogist idealizes the important data and constructs at least virtually a system of individuals related together. This may be represented in a set of families — a compilation (CMP) related to the ancestral families (ANSF_i), as many as may be necessary:

CMP ANSF_i

Each ancestral family is composed of the father (FTH_j), the mother (MTH_j) and a number (i) of children (CHLD_ji):

ANSF_j FTH_j + MTH_j CHLD_ji

The compilation could be easily expanded to include families of the children, etc., by further properly formulated definitions.

Suppose the genealogist derives an ancestral chart, ANCH, from this compilation of family groups, such that:

ANCH IND + ( MANS_2i + FANS_2i+1 )

distinguishing the male ancestors (MANS_i) from the female (FANS_i), so that the marriage data is recorded once, though it belongs to both:

PN + BTH + MG + DTH

FANS_i PH + BTH + DTH

DT + PL

with further definitions of the structure of ANCH made by formation rules of the kind given above. How is such a structure derived? One simply copies the data from an ANCF to the proper portion of the ANCH. We are prepared to state their equivalence:

FTH_i MANS_2i

MTH_i FANS_2i+1

And there is a j, such that:

CHLD_ij

We state the derivation in the form of a rule of transformation. We say that S₁ is derivable from S₂ by the rule S₁ = S₂ or the rule S₂ = S₁.

The above kind of derivation is no more than copying. Genealogy involves more than copying or checking for precise imitation. A genelaogist is forever resolving discrepancies, deciding on the relative reliability of records and resolving conflicts between competing compilations and sources.

A simple kind of conflict arises from records giving facts fy different means of expression — different languages. A rule to resolve such differences may take the form:

“9bre” “Nov”

The interpretation of this rule is that S₁ (S₂) is derivable from S₁ (S₃) by the rule S₃ S₂. This sort of rule can express how we get “Smith” from “Schmidt” (or vice versa) or “Aubonne” from “Aulbonne.” Some of these translation rules will have to be sensitive to the context, be applicable only in certain time periods, or in certain localities.

There are conflicts in the records more deep-seated than what simple translation will resolve. Such a conflict is illustrated in the following example. Suppose we have a death record that gives the male ancestor’s age as one thing and an obituary notice that gives it as something else. This might have structures partly describable as:

DCT PN + DTH + AE + …

OBT PN + DTH + AE + …

AE YRS + MOS + DYS

When we defined MANS above we did not allow for a birth to be expressed by an age at some dated event, but it would certainly be possible:

BTH DT + AE + PL

It would now be possible to transform the structure of our DCT or our OBT to conform with that of MANS:

The symbol Ø is meant to signify that these structures are not further defined in this record, whereas the others are. It now makes sense to say MANS_i = DCT_j.

But the same rule given here to transform DCT could be given for OBT as here defined. So we can also say MANS_i = OBT_j and we have two possible compilations. Should we say that two compilations are the same if they are only a little different? Where would we draw the line? It would seem that both alternatives ought to be retained.

The genealogist can live with a certain amount of conflict. Yet, usually for one reason or another he discards one version in favor of another; some he may assume are fabrication; others he may take as proof of misidentification.

One solution to this dilemma may be to associate with each element a factor of reliability — a weighting adjusted such that the reliabiliity of the whole compilation may be calculated from it. With experience in a source the genealogist subconsciously ranks the different elements: How often has the source been used successfully in the past? How often have the various elements been found to be in conflict with data from other sources?

Works of Wonder | Science of Genealogy | Occasional Papers | Genealogy Formalized