GENEALOGY FORMALIZED

— Bruce D. Despain

[Preface of 1988 edition] It is interesting that the following paper, written almost ten years ago, addresses many of the concerns that are currently being expressed by the developers of GEDCOM. Back then there was no place to publish this paper very widely. I am not too sure that even today there would be anyone in either the genealogical or computer science fields interested in the concepts addressed. The recent conference on GEDCOM for software developers does, however, give hope that there might be some few now willing to listen.

Many of the problems in establishing standards for interfaces between genealogical data bases are in the conceptual model of the genealogical data. The Family History Department has confronted some of these problems in their attempts to transfer data from the Personal Ancestral File structures to those of the Ancestral File. When two structures fail to correspond, too often the results are that the data is lost. In such cases the conflicts might well be resolved with a deeper analysis of the data. Such an analysis can proceed within the conceptual framework introduced here.

Let me make a few comments as to the relevance of the approach outlined. For example, it should not be too surprising that the tags used by GEDCOM correspond to many of the components of structure symbolized in my calculus. These are basic elements of genealogical sources and compilations. What is needed is to get away from the canonization of all such tags. I am now of the opinion that we need to work toward the means of giving tag definitions hierarchically within the GEDCOM file. [cf. today's XML language] It may also be possible with GEDCOM to give a number of transformations to a sufficiently differentiated standard set. [embedded procedures]

— The Author

It is by using a symbolic language that scientists have succeeded in representing the structures and interrelationships of the concepts of their discipline in a rigorous and formal way. Physicists have long used mathematics as their language of symbols to give a very definite meaning to such terms as “mass,” “energy,” “velocity,” and “force.” Would it be worthwhile to consider seriously what the consequences might be if a sufficiently expressive symbolic system were applied to a field like genealogy? What would happen if a formally rigorous definition were given to such concepts as “vital event,” “date,” “locality,” “child,” etc.?

One advantage that would come from applying a symbolic language to genealogy would be the possibility of isolating specific problems of definition. But by far the most useful advantage is the ability to answer the question of whetherr a particular genealogical compilation is actually based on a particular set of records — the question is reduced to a matter of mathematical deduction. The degree to which we may safely place confidence in that compilation may find definition. All such advantages depend on the descriptive power of the language constructed. We would want to avoid the construction being ineffective in these ways and reducing to an academic exercise.

This article outlines the structure of a symbolic language. The language is capable of describing structures of genealogical interest several examples being taken from my own experience. To most the genealogical concepts are terribly elementary, but this is because the paper concentrates on the form. One of the virtues of a symbolic language is the way it forces the user to be explicit where one might be accustomed to taking for granted. This is the experience of the computer programmer. Formal structuring tends to treat something personal and subjective in a starkly objective way by abstracting only the obviously relevant dimensions. Such an exploitation of genealogy may eventually lead to the computer taking over the tedium in our work leaving us to pursue the personal, historical and other more subjective dimensions.