Symbolizing the genealogical research process.  

Although there remain many areas yet unclear in the proper formalization of the structure of a genealogical source, the direction we go to achieve a full characterization depends critically on the way we choose to symbolize other steps in the research process. We demonstrate the relationship of one source to another in this process. Here is one direction we might take in symbolizing the processes of analysis and interpretation of a source.

In analyzing his sources the genealogist idealizes its structure to correspond with reality as he has come to understand it. He abstracts from this “reality” the elements which he takes as genealogically significant. He may display such elements on an ancestral chart or on a set of family group sheets. Let’s investigate the structure of these forms and compare their parts with those of our sources.

When we compare structures, we will need to say that one structure, S1, is equivalent to another structure, S2. This is the case only if every structure, S3, contained in S1 is also contained in S2, and if every structure, S4, contained in S2 is also contained in S1, i.e., S1 = S2. S1 is contained in S2 provided S1 has the form:

and S2 has the form:

If S1 is contained in S2, we will state it as: S2(S1). When we speakof the component of structure Ci as a structure, we will mean Ci together with all the components that define it, and all the components that define them, and so forth. So, if we say Ci = Cj, we mean the structures which branch from them are equivalent as well. Note that the order of the elements of the structures need not be the same, since equivalence is defined in terms of containment alone.

The genealogist idealizes the important data and constructs at least virtually a system of individuals related together. This may be represented in a set of families — a compilation (CMP) related to the ancestral families (ANSFi), as many as may be necessary:


Each ancestral family is composed of the father (FTHj), the mother (MTHj) and a number (i) of children (CHLDji):


The compilation could be easily expanded to include families of the children, etc., by further properly formulated definitions.

Suppose the genealogist derives an ancestral chart, ANCH, from this compilation of family groups, such that:

ANCH IND + ( MANS2i + FANS2i+1 )

distinguishing the male ancestors (MANSi) from the female (FANSi), so that the marriage data is recorded once, though it belongs to both:


with further definitions of the structure of ANCH made by formation rules of the kind given above. How is such a structure derived? One simply copies the data from an ANCF to the proper portion of the ANCH. We are prepared to state their equivalence:


And there is a j, such that:


We state the derivation in the form of a rule of transformation. We say that S1 is derivable from S2 by the rule S1 = S2 or the rule S2 = S1.

The above kind of derivation is no more than copying. Genealogy involves more than copying or checking for precise imitation. A genelaogist is forever resolving discrepancies, deciding on the relative reliability of records and resolving conflicts between competing compilations and sources.

A simple kind of conflict arises from records giving facts fy different means of expression — different languages. A rule to resolve such differences may take the form:

“9bre” “Nov”

The interpretation of this rule is that S1 (S2) is derivable from S1 (S3) by the rule S3 S2. This sort of rule can express how we get “Smith” from “Schmidt” (or vice versa) or “Aubonne” from “Aulbonne.” Some of these translation rules will have to be sensitive to the context, be applicable only in certain time periods, or in certain localities.

There are conflicts in the records more deep-seated than what simple translation will resolve. Such a conflict is illustrated in the following example. Suppose we have a death record that gives the male ancestor’s age as one thing and an obituary notice that gives it as something else. This might have structures partly describable as:


When we defined MANS above we did not allow for a birth to be expressed by an age at some dated event, but it would certainly be possible:


It would now be possible to transform the structure of our DCT or our OBT to conform with that of MANS:

The symbol Ø is meant to signify that these structures are not further defined in this record, whereas the others are. It now makes sense to say MANSi = DCTj.

But the same rule given here to transform DCT could be given for OBT as here defined. So we can also say MANSi = OBTj and we have two possible compilations. Should we say that two compilations are the same if they are only a little different? Where would we draw the line? It would seem that both alternatives ought to be retained.

The genealogist can live with a certain amount of conflict. Yet, usually for one reason or another he discards one version in favor of another; some he may assume are fabrication; others he may take as proof of misidentification.

One solution to this dilemma may be to associate with each element a factor of reliability — a weighting adjusted such that the reliabiliity of the whole compilation may be calculated from it. With experience in a source the genealogist subconsciously ranks the different elements: How often has the source been used successfully in the past? How often have the various elements been found to be in conflict with data from other sources?