-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgedcom5.txt
2881 lines (2283 loc) · 135 KB
/
gedcom5.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
THE GEDCOM STANDARD
DRAFT Release 5.0
25 September 1991
Prepared by the
Family History Department
The Church of Jesus Christ of Latter-day Saints
Suggestions and Correspondence:
GEDCOM Coordinator - 3T
Family History Department
50 East North Temple
Salt Lake City, UT 84150
USA
Telephone (USA) 801-240-5225.
TABLE OF CONTENTS
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Purpose and Content of Document . . . . . . . . . . . . . . . . . . 3
Changes in Version 5.0 . . . . . . . . . . . . . . . . . . . . . 4
GEDCOM Product Registration. . . . . . . . . . . . . . . . . . . 5
GEDCOM Software Library. . . . . . . . . . . . . . . . . . . . . 5
Chapter 1
Data Representation Grammar . . . . . . . . . . . . . . . . . . . . 6
Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 2
Lineage-Linked Grammar. . . . . . . . . . . . . . . . . . . . . . .14
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .14
Lineage-Linked Grammar Organization. . . . . . . . . . . . . . .14
Record Structures of the Lineage-Linked Form . . . . . . . . . .15
Substructures of the Lineage-Linked Form . . . . . . . . . . . .19
Primitive Elements of the Lineage-Linked Form. . . . . . . . . .22
Compatibility with previous GEDCOM releases. . . . . . . . . . .33
Chapter 3
GEDCOM Transmission File. . . . . . . . . . . . . . . . . . . . . .34
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .34
Headers and Trailers in the GEDCOM Transmission. . . . . . . . .34
Why Headers and Trailers are used. . . . . . . . . . . . . . . .35
How to use a Header. . . . . . . . . . . . . . . . . . . . . . .35
How to use a Trailer . . . . . . . . . . . . . . . . . . . . . .38
Naming your Transmission File. . . . . . . . . . . . . . . . . .38
Chapter 4
GEDCOM Values . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .39
Why Values are used. . . . . . . . . . . . . . . . . . . . . . .40
Guidelines for using Values. . . . . . . . . . . . . . . . . . .41
How to record Names of Individuals . . . . . . . . . . . . . . .41
How to record Dates. . . . . . . . . . . . . . . . . . . . . . .42
How to record Places . . . . . . . . . . . . . . . . . . . . . .44
How to record Events . . . . . . . . . . . . . . . . . . . . . .45
How to designate Time. . . . . . . . . . . . . . . . . . . . . .45
How to use Pointers. . . . . . . . . . . . . . . . . . . . . . .46
How to use other Values. . . . . . . . . . . . . . . . . . . . .47
Chapter 5
Using Character Sets in GEDCOM. . . . . . . . . . . . . . . . . . .48
Why various Character Sets are used. . . . . . . . . . . . . . .48
How to change Character Sets . . . . . . . . . . . . . . . . . .49
Appendix A
Lineage-Linked GEDCOM Tag Definition. . . . . . . . . . . . . . . .52
INTRODUCTION
GEDCOM was developed by the Family History Department of the Church of Jesus Christ
of Latter-day Saints to meet the data communication requirements of the Family
History Department and other institutions wishing to exchange computerized
genealogical data. GEDCOM is an acronym for GEnealogical Data Communication.
GEDCOM is provided to foster the exchange of genealogical information and the
development of a wide range of inter-operable software products to assist
genealogists, historians, and other researchers in the exchange of genealogical
information.
Purpose and Content of This Document
This technical document is written for computer programmers, system developers, and
user specialists. It defines a flexible format for exchanging structured
genealogical information between different computer systems.
The chapters in this document detail the following specifications:
* Data Representation Grammar * Values
* Lineage-Linked GEDCOM Grammar * Character Sets
* GEDCOM Transmission File
This document describes GEDCOM in two different levels. The lower level defines a
general-purpose data representation language for representing any kind of structured
information in a sequential media. This lower level is known as the GEDCOM data
format. It deals with the syntax and identification of structured information in
general, but does not deal with the semantic content of any particular kind of data.
The higher level defines specific content for a particular kind of data to be
exchanged between a group of compatible systems. GEDCOM has been used for many
different kinds of data. Each kind of data is referred to as a form of GEDCOM and
has its own definition at this level. GEDCOM forms have been defined for lineage-
linked, bibliographic, census, religious, and other kinds of data, including several
that are not related to genealogy.
The first two chapters are presented as language grammars and are technically
oriented. Chapter 1 presents basic GEDCOM concepts and then defines the lower-level
GEDCOM format. This chapter will be useful to anyone using GEDCOM. Chapter 2
defines the higher-level, specific form of GEDCOM known as the lineage-linked form.
The lineage-linked form is the form used by commercial genealogical software systems
for exchanging compiled, linked information about individuals. The other forms of
GEDCOM are not publicly exchanged at this time, and are not discussed in this
document.
Chapters three and four discuss GEDCOM in a more tutorial fashion. If
inconsistencies are present, they should be resolved in favor of the grammars
presented in the first two chapters. The fifth chapter defines GEDCOM character
sets. Appendix A defines tags used in the lineage-linked form of GEDCOM. Appendix
B contains character tables that define the GEDCOM character set.
Changes in Version 5.0
Prior official versions of The GEDCOM Standard were released in October 1987 (3.0)
and August 1989 (4.0). Versions prior to these were preliminary and were not
established as a standard.
This GEDCOM release (5.0) includes the first standard definition of the lineage-
linked form of GEDCOM, and also includes the first major expansion of the lineage-
linked form since its initial use in GEDCOM 3.0. Products based on releases 3.0 and
4.0 versions should be upward-compatible. These existing registered GEDCOM-
compatible systems should still be able to exchange data with newer systems that use
this version, and will still be considered GEDCOM-compatible.
There are several purposes for the 5.0 release of GEDCOM:
* Re-define the GEDCOM data representation grammar in a shorter, more
rigorous, and more precise format, for ease of understanding (see
chapter 1). The GEDCOM format remains the same, even though the
description of it is changed.
* Define the precise combinations of tags, values, and pointers allowed in
the lineage-linked form (see chapter 2). This is the form of GEDCOM
currently exchanged by commercial genealogical software systems, and it
remains unchanged, other than adding new tags and the upward-compatible
structural extensions listed below. (The lineage-linked form should not
be confused with other forms of GEDCOM, which apply the basic GEDCOM
data format with different tag, value, and pointer combinations for
other purposes.)
* Define representations for supporting information such as source
citations, submitter identification, and notes. (See chapter 2,
<<SUPPORT_INFO>> structure in the lineage-linked grammar.)
* Define user-defined EVENts.
* Define user-defined ASSOciations with INDIviduals other than direct
family relationships.
* Allow descriptive text to further qualify MARRiage, DIVorce, and other
event structures.
* Require SOURce VERSion (product version) and GEDCom VERSion information.
* Define DATE modifier (ABT, BEF, AFT, etc.) and a more rigorous regular
date format.
GEDCOM Product Registration
Developers of GEDCOM-compatible products using the lineage-linked form of GEDCOM
(see chapter 2) should register their product by submitting the following
information to the GEDCOM coordinator:
* A sharable, brief, representative sample of GEDCOM output from their
product, on diskette, for registration review and for compatibility
testing by other developers
* A proposed unique SOURce name to identify the product (not the company),
up to 40 characters long, allowing mixed upper and lower case, with no
embedded spaces (use underscore "_" instead), for inclusion in the
product's GEDCOM output HEADer record
* An optional text file containing relevant technical documentation about
the product's GEDCOM implementation.
GEDCOM Software Library
A library of public domain source code, in the C programming language, is available
to help reduce the work required to achieve GEDCOM compatibility.
Chapter 1
DATA REPRESENTATION GRAMMAR
INTRODUCTION
This chapter describes the core GEDCOM data representation language, which provides
a general purpose way to represent items of related information using a sequential
stream of characters. At this generic level, GEDCOM may be used to represent any
form of structured information, not just genealogical data.
CONCEPTS
A GEDCOM transmission represents a database in the form of a sequential stream of
related records. A record is represented as a sequence of tagged, variable-length
lines, arranged in a hierarchy. A line contains a hierarcical level number, a tag,
and a value. The GEDCOM line is terminated by a carriage return and/or line feed
character.
The tag identifies the value of a line in the same sense as a field name identifies
a field in a database record. The data is self-defining. Tags allow a field to
occur any number of times within a record, including zero, and allow the use of
different or new fields without otherwise penalizing compatibility between different
systems exchanging data.
The hierarchical relationships are indicated by a hierarchical level number.
Subordinate lines have a higher level number. The hierarchy allows a line to have
sub-lines, which in turn may have their own sub-lines, etc. A line and its sub-
lines constitute a context or enclosure, that is, a cluster of information
pertaining directly to the same thing. This hierarchical arrangement corresponds
with the natural hierarchy found in most structured information.
The beginning of a record is indicated by a line whose level number is zero.
A GEDCOM receiver system scans the input for expected information by looking for
specific tags and processing the associated values. Unrecognized tags (perhaps from
a sending system whose database contains some different information) are handled by
not processing the associated value nor its enclosed sub-lines, i.e, the entire
context is ignored. These are treated as exceptions by printing them in an
exception report or saving them in some generic way.
In addition to hierarchical relationships, GEDCOM defines inter-record references
which allow a record to be logically related to other records, without introducing
redundancy. These are represented by two additional, but optional, parts of a line:
a cross-reference pointer and a cross-reference identifier. The cross-reference
pointer "points at" a related record, identified by a required matching cross-
reference identifier.
GRAMMAR
The GEDCOM data format, a data representation language, is defined in the grammar in
this chapter. The grammar is a set of rules that specify what sequences of
characters constitute valid GEDCOM expressions. The rules are expressed as a set of
pattern definitions, where each pattern is defined in terms of more primitive
patterns. Pattern definitions consist of the pattern name, a separator ":=", and a
list of sub-patterns from which one alternative is selected. To read the grammar,
components of the selected pattern are substituted recursively by other patterns or
constants, continuing until all patterns are resolved into constants. Pattern names
are in bold print. Constants are the actual symbols that appear in the complete
expressions and are not bolded.
A GEDCOM transmission consists of a sequence of physical records, each of which
consists of a sequence of gedcom_lines, all contained in a sequential file or
stream of characters. The beginning of a new physical record is designated by a
line whose level number is 0. Physical records are intended to be small enough to
fit within available memory, though absolute limits are not established. A
gedcom_line has the following syntax:
gedcom_line:=
level delimtr xref_id delimtr tag delimtr line_content terminator
* The gedcom_line represents one piece of information. One
gedcom_line corresponds to one field in traditional database or file
terminology, or to a grouping of fields as in a record or subrecord.
The concepts of fields, records, and subrecords merge together in
GEDCOM.
* The total length of a GEDCOM line does not exceed 255 characters.
* Leading white space (tabs, spaces) to a GEDCOM line should be ignored by
the reading system. Systems outputing GEDCOM should not have any white
space in front of the GEDCOM line (at least for the near future).
* The xref_id and line_content are optional.
The following are examples of valid (though unrelated) GEDCOM lines:
0 @1234@ INDI
1 AGE 13
1 CHIL @1234@
The first line has a level number 0, a xref_id of @1234@, an INDI tag, and
no line_content.
The second line has a level number 1, no xref_id, an AGE tag, and a
line_content value of 13.
The third line has a level number 1, no xref_id, a CHIL tag, and a
line_content of a pointer to a xref_id named @1234@.
level:=
digit
digit level
The level number works the same way as the level of indentation in an
indented outline, where indented lines provide detail about the item under
which they are indented. A line at any level L is enclosed by and pertains
directly to the nearest preceding line at level L-1. The Level L may only
increase by at most 1.
The enclosed subordinate lines at level L are said to be in the context of the
enclosing superior line at level L-1. The meaning of a tag (see tag below)
is interpreted in the context of the tags of the enclosing line(s). Take the
following record about an individual's birth and death dates, for example:
0 INDI
1 BIRT
2 DATE 12 MAY 1920
1 DEAT
2 DATE 1960
In this example, the expression DATE 12 MAY 1920 is interpreted within the
INDI BIRTH context, representing the INDIvidual's BIRTh DATE. The second DATE
is in the INDI DEAT (death) context. The complete meaning of DATE depends on
the context. (Note: the above example is indented according to the level
numbers to make the concept more obvious. In the actual GEDCOM data there is
no indentation, just level numbers lined up vertically on the left margin.)
xref_id:=
@xref_text@
The xref_id (cross reference identifier) is the target for pointers,
allowing physically separate records to reference each other logically. The
at signs "@" delimit the xref_text. The xref_text is unique within a
GEDCOM transmission, and is between 1 and 15 characters in length.
The purpose of the xref_text is to uniquely identify a record in the
transmission. Receiving systems typically map xref_text into their own
internal record identifiers, maintaining the relationships between records but
not necessarily the xref_text values used in the transmission. The
xref_text typically consists of some form of the internal record identifier
from the database of the sending system.
A record with a matching xref_id must exist for every pointer in the
transmission, unless the colon ":" character is present in the pointer,
indicating a network reference (future) to a permanent record that might not
be in the transmission. The colon ":" character is therefore reserved within
xref_ids and pointers. The colon ":" character restriction does not apply
to the other parts of a GEDCOM_line.
delimtr:=
space
The delimtr (delimiter), (a single space character), terminates both the
variable-length level number and the tag. Note that space characters may
also be present in a value.
tag:=
letter
tag letter
tag digit
A tag consists of a variable length sequence of letters and digits,
beginning with a letter. The tag represents the meaning of the
line_content within the context of the enclosing lines. Specific tags are
defined in Appendix A.
Although existing tags are only three or four characters long, systems should
prepare to handle tags from 1 to 15 characters in length, terminated by a
delimtr.
Valid combinations of specific tags, values, xref_ids and pointers are
further constrained by the GEDCOM form defined for representing a given kind
of information (see Chapter 2 for the Lineage-Linked form grammar).
line_content:=
value
pointer
value pointer
escape_sequence
escape_sequence value
The line_content identifies an object within the domain of possible values
allowed in the context of the tag. The combination of the tag, the
line_content, and the line_content of the preceeding superior line (level
N - 1) is a 3-tuple that defines an association between the two objects. A
pointer stands in the place of the context identified by the matching
xref_id. Theoretically, a receiving system should be prepared to follow a
pointer to find any needed value in a manner that is transparent to the
logic of the subsystem that is looking for specific tags. This highly-
flexible facility will probably be used more in the future. For the time
being, however, the use of pointers is explicitly defined within the GEDCOM
form. The value pointer combination is used to indicate what kind of
association or relationship exists between the current record and the record
being pointed to. (See chapter 2). The escape_sequence grammar is used to
specify special processing, such as switching character sets or calendars for
date interpretation, and for indicating an inclusion of a non_gedcom data form
into the GEDCOM structure.
terminator:=
carriage_return
line_feed
carriage_return line_feed
line_feed carriage_return
The terminator delimits the variable-length line_content and signals the
end of the line.
NOTE: Some existing systems provide an option to produce an indented GEDCOM
output for user readability, using space or tab characters between the
terminator and the level number of the next line to visibly show the
hierarchy. Also, some have suggested allowing extra blank lines to visibly
separate physical records. These features may be incorporated into the GEDCOM
standard at some future time, but for now, such a change would render some
existing systems incompatible. Therefore, we recommend that new systems be
prepared to discard extra carriage returns, line feeds, spaces and tabs
immediately preceeding the level number during input, though output for now
should normally be constrained as specified above, without indentation or
blank lines.
xref_text:=
letter
digit
punctuation
space
xref_text xref_text
The xref_text is any arbitrary combination of characters, except for the at-
sign "@" and the terminator. The xref_text is not retained by the
receiving system, and may therefore be formed from any convenient combination
of identifiers from the sending system. No meaning is attributed by the
receiver to any part of the xref_text, except for its unique association
with the corresponding record. The use of the colon ":" character is
reserved. To avoid confusion with the escape sequence prefix "@#", a xref_text
must not begin with a pound sign "#".
value:=
letter
digit
punctuation
space
colon
value value
The value represents an object within the domain of the full context of the
accompanying tag. This domain is defined by a specific grammar for
representing a given GEDCOM form (see Chapters 2 for Lineage-Linked grammar).
Values are not encoded in binary or other schemes for reducing space
requirements, and they are generally constrained to be understandable by a
typical user without decoding. This is intended to reduce the decoding burden
on the receiving software. A GEDCOM-optimized data compression standard will
be defined in the future to reduce space requirements. Meanwhile, users may
agree to compress GEDCOM files using any compression system available to both
sender and receiver.
pointer:=
xref_text
The pointer represents the association between two objects that reside in
different physical records. Complex logical records are divided into smaller
physical records to accommodate memory constraints, many to many
relationships, and independent record creation.
The pointer must match a corresponding xref_id within the transmission,
unless the colon ":" character is present (future network reference to a
permanent record). A pointer is given instead of duplicating the object
identified in the value of the line pointed at, though the logical result is
equivalent. An expanded traversal of a record tree includes following the
pointers to related records to some depth, and splicing those records
(logically) into the resultant expanded tree. Pointers may refer to either
records which have not yet appeared in the transmission (forward reference) or
to records that have already appeared earlier in the transmission (backward
reference). This arrancement usually requires a preliminary pass to construct
a lookup table to support random access by xref_id during subsequent passes.
A few GEDCOM structures allow the sending system to use either a pointer or a
value within the same context (with the same tag and enclosing tags), and
sometimes the line will contain both a value and pointer. In all cases, the
pointer will be at either the beginning or the ending of the line_content,
never in the middle. GEDCOM receiving systems should be prepared in certain
cases to follow pointers recursively and "splice" either text or compound
GEDCOM structures into the position occupied by the pointer. Assembling
pointer-based structures into large segments of text may require staging the
text in disk-based files to accommodate memory constraints. Specific
requirements are defined in lower-level GEDCOM grammars for a particular form
of GEDCOM data (see Chapter 2 for Lineage-Linked grammar).
letter:=
One of the letters A-Z, a-z, plus all diacritic and special characters from
the GEDCOM character set.
digit:=
One of the digits 0 1 2 3 4 5 6 7 8 9
punctuation:=
One of the non-letter and non-digit characters from the GEDCOM character
set, except the space and colon ":" characters. The colon ":" character may
be used within a value.
The at sign "@" is used for an escape mechanism to identify pointers,
xref_id, character set changes, calendar changes, or other shifts in
representation rules. To include the at sign "@" as data in a value, two of
them should be sent together "@@", which represent a single at sign "@" as
data.
escape_sequence:=
@#Dcalendar_type@
@#Ccharacter_set@
@#Lescape_codes_with_length@ inclusion_data
@#Ffile_reference_escape_codes@
calendar_type :=
A reserved name indicating which calendar base should be used to interpret the
associated date value (see section in chapter 4 on "How to Record Dates").
character_set :=
A reserved name indicating which character set should be used to interpret the
codes contained in the associated value.
escape_codes_with_length :=
data_type_code delimtr data_length
This escape sequence indicates the inclusion of data of a non-gedcom structure
is to be enclosed within the context of the GEDCOM structure. The
data_length defines the length of the data being included (note: this
length is not constrained by the 255 character line limit adopted for standard
GEDCOM form and may be arbitrarily long). The data_type_code defines the
form of the inclusion data. The specific rules and type codes have not yet
been standardized. Reading systems should skip over the length of data
indicated if they are not prepared to process the indicated data type.
data_type_code :=
A reserved one-letter code indicating what data form is contained in the
inclusion data.
data_length :=
digit
digit digit
A decimal number in ASCII digits, indicating the count of characters contained
in the inclusion_data.
inclusion_data :=
This data becomes the line content value whose type and size are defined by
the associated escape sequence.
file_reference_escape_codes :=
file_type_code file_path_name
This escape sequence allows both GEDCOM and NON-GEDCOM data to be transmitted
in a separate file for a logical inclusion within this GEDCOM context. Note:
the data length of the file is not constrained by the 255 character line limit
adopted for standard GEDCOM form. Systems not prepared to process the
indicated type of file should skip past the filename to continue processing.
file_type_code :=
A reserved one letter code indicating what type of data is contained in the
inclusion file. Specific types have not yet been standardized.
file_path_name :=
This is the path name required for accessing the inclusion file from within
the current directory. In PC-DOS terms, this means either that a path must be
fully specified, including the root directory and optional drive, or that a
partial path refers to a file within the same directory as the GEDCOM
transmission file or within a directory subordinate to the current directory.
Chapter 2
LINEAGE-LINKED GRAMMAR
INTRODUCTION
This chapter describes specific tag, value, and pointer combinations for exchanging
lineage-linked information in the GEDCOM format. Lineage-linked data pertains to
individuals linked in family relationships across multiple generations.
This lineage-linked grammar is based on the general framework of the GEDCOM data
representation grammar defined in the previous chapter. These two layers define the
GEDCOM format used by commercial genealogical software systems to exchange data.
Other specialized GEDCOM-based grammars have been created for different purposes.
Other uses of the general-purpose GEDCOM data representation should not be confused
with this specific usage for lineage-linked genealogical data, as defined in this
chapter, which is the only form of GEDCOM exchanged by commercial genealogical
software systems at this time.
LINEAGE-LINKED GRAMMAR ORGANIZATION
This lineage-linked GEDCOM grammar is organized into three levels: structure
components, substructure patterns, and primitive elements. Structures and
substructures are indicated by enclosing the structure name within double angle
<<brackets>>. Primitive element patterns are enclosed in single angle <brackets>.
The definition of each structure consists of the structure name, a ":=" separator,
and the structure's component pattern. This pattern is an enclosure of GEDCOM lines
composed of primitive elements and/or substructure enclosures.
The record structures of the lineage-linked form are shown in the section 'RECORD
STRUCTURES OF THE LINEAGE-LINKED FORM'. The supporting substructures contained that
make up each of the record types are shown in alphabetic sequence in the section
'SUBSTRUCTURES OF THE LINEAGE-LINKED FORM'. The primitive elements are the lowest
level (terminal) components that are not further divided and are shown in the
section 'DEFINITION OF PRIMITIVE ELEMENTS' in alphabetic sequence.
Some elements have optional sub-pattern choices. These choices are shown by listing
the alternative sub-patterns between opening and closing square brackets "[]" and
separating each choice with a vertical bar "|", meaning that exactly one of the
alternate substitutions must be selected.
The number of occurences of a sub-pattern allowed within a pattern is defined in an
occurence definition in braces "{}" on each line. This number indicates the minimum
and maximum number of occurrences allowed for a pattern component in the form
{minimum:maximum}. Note that minimum and maximum occurrence limits are defined
relative to the enclosing superior line. The enclosing line may prescribe zero,
one, or many possible occurences, which, in effect, is multiplied by the occurences
allowed for any of its subordinate lines. This means that a required line (minimum
= 1) is not required if an optional enclosing line is not given. Similarly a line
occurring only once (Maximum = 1) may occur multiple times as long as each occurs
under its own multiple-occurring superior line.
The level numbers for any sub-structure are represented as "n", "+1", "+2", etc. so
that they may be used in more than one place at different starting level numbers. In
these cases, "n" equals the same level number where the pattern first appears, and
the "+1" means level n+1, "+2" means level n+2, etc.
Unless stated otherwise, the only ordering imposed on GEDCOM lines within an
enclosure arises when multiple opinions or other items are presented for which only
one may be expected by a receiving system. For example, a person may have been
known by more than one name, or evidence may suggest either a birth in 1840 in New
York or in 1837 in Pennsylvania. In this case, the most credible or preferred
information is listed first, followed by less credible or less preferred items of
conflicting or alternate information. Receiving systems that can only use one item
of a given kind should use the first-listed value. The same is true for output
documents that can only display one value, such as family group records, pedigree
charts, and others.
Conflicting dates or places of an event should be represented in separate event
structures to provide a place for the corresponding source citations, rather than
place multiple dates or multiple places under the same enclosing event line.
Even though no other ordering is defined beyond the one described above, some GEDCOM
programming tools optimize performance based on the assumption that tags generally
appear in a typical order. Performance will be better in these circumstances when a
common sequence is followed. Therefore, sending systems are encouraged to present
GEDCOM structures in the same general order as the one given in these patterns,
unless there is a sufficient reason to use a different sequence.
RECORD STRUCTURES OF THE LINEAGE-LINKED FORM
LINEAGE_LINKED_GEDCOM:=
This is a model of the Lineage-Linked Gedcom structure for submitting data to
other lineaged-linked GEDCOM processing systems. A header and a trailer record
are required and they enclose any number of data records. A data record is
either a SUBMitter, INDIvidual, FAMily, NOTE, or SOURce record structure.
0 <<HEADER>> {1:1}
0 <<RECORD>> {0:M}
0 TRLR {1:1}
NOTE:
There are certain subordinate structures that may occur under any tag, at any
level. These structures are contained within a structure called SUPPORT_INFO.
The SUPPORT_INFO structure should be considered as subordinate to any GEDCOM
line, even though it is not shown (for readability). The SUPPORT_INFO
structure may occur from zero to many times {0:M}. Each occurance of the
SUPPORT_INFO structure contains exactly one logical choice of one of the
subordinate SUPPORT_INFO structure components. This means that from none to all
of the structures from SUPPORT_INFO could appear as subordinate to any GEDCOM
line, however, some may not be locigal, for instance one would not expect to
see a DATE tag as subordinate to another DATE tag.
HEADER:=
The header structure provides information for identifying the submitted data.
Specific system names are reserved to identify which system sent the data and
in some cases which system is intended to receive it. The DESTination system
name "ANSTFILE" is required for submission to the Family History Department's
Ancestral File system. For LDS temple ordinance submissions, the required
DESTination name is "TempleReady". (See chapter 3, "How To Use A Header" for
detail.)
n HEAD
+1 SOUR <SYSTEM_NAME> {1:1}
+2 VERS <VERSION_NUMBER> {1:1}
+2 DATA <NAME_OF_SOURCE_DATA> {0:1}
+3 DATE <PUBLICATION_DATE> {0:1}
+1 DEST <SYSTEM_NAME> {0:1}
+1 DATE <TRANSMISSION_DATE> {0:1}
+2 TIME <TIME_VALUE> {0:1}
+1 SUBM @XREF:SUBM@ {1:1}
+1 FILE <FILE_NAME> {0:M}
+1 COPR <COPYRIGHT_STATEMENT> {0:1}
+2 CONT <TEXT> {0:M}
+1 GEDC {1:1}
+2 VERS <VERSION_NUMBER> {1:1}
For this version of GEDCOM, the value should be "5.0"
+2 FORM <GEDCOM_FORM> {1:1}
+1 CHAR <CHARACTER_SET> {0:1}
+1 LANG <LANGUAGE_OF_TEXT> {0:1}
RECORD:=
[
n <<FAMILY_RECORD>>
+1 <<SUPPORT_INFO>> {0:M}
|
n <<INDIVIDUAL_RECORD>> {0:1}
+1 <<SUPPORT_INFO>> {0:M}
|
n <<NOTE_RECORD>> {0:1}
+1 <<SUPPORT_INFO>> {0:M}
|
n <<REPOSITORY_RECORD>> {0:1}
+1 <<SUPPORT_INFO>> {0:M}
|
n <<SOURCE_EVENT_RECORD>> {0:1}
+1 <<SUPPORT_INFO>> {0:M}
|
n <<SOURCE_RECORD>> {0:1}
+1 <<SUPPORT_INFO>> {0:M}
|
n <<SUBMITTER_RECORD>> {0:1}
+1 <<SUPPORT_INFO>> {0:M}
]
FAMILY_RECORD:=
n @xref:FAM@ FAM
+1 HUSB @XREF:INDI@ {0:M}
+1 WIFE @XREF:INDI@ {0:M}
+1 CHIL @XREF:INDI@ {0:M}
+1 <MARR_EVENT_TAG> <MARRIAGE_DESCRIPTOR> {0:M}
+2 DATE <DATE_VALUE> {0:1}
+2 PLAC <PLACE_VALUE> {0:1}
+1 <DIVORCE_EVENT_TAG> <DIVORCE_DESCRIPTOR> {0:M}
+2 DATE <DATE_VALUE> {0:1}
+2 PLAC <PLACE_VALUE> {0:1}
+1 ASSO <ASSOCIATION_DESCRIPTOR> @XREF:INDI@ {0:M}
+1 <<LDS_FAM_ORDINANCE_EVENT>> {0:M}
INDIVIDUAL_RECORD:=
n @XREF:INDI@ INDI
+1 RFN <PERMANENT_RECORD_FILE_NUMBER> {0:1}
+1 <<INDIVIDUAL>> {1:1}
+1 <<LDS_INDI_ORDINANCE_EVENT>> {0:M}
+1 ANCI @XREF:SUBM@ {0:M}
+1 DECI @XREF:SUBM@ {0:M}
+1 FAMC @XREF:FAM@ {0:M}
+2 <<CHILD_FAMILY_EVENT>> {0:M}
+1 ASSO <ASSOCIATION_DESCRIPTOR> @XREF:INDI@ {0:M}
+1 ASSO <ASSOCIATION_DESCRIPTOR> @XREF:FAM@ {0:M}
+1 ALIA @XREF:INDI@ {0:M}
+1 FAMS @XREF:FAM@ {0:M}
+1 NAMS @XREF:INDI@ {0:M}
+2 REL <RELATIONSHIP_DESCRIPTOR> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M}
+1 AFN <ANCESTRAL_FILE_NUMBER> {0:1}
NOTE_RECORD:=
n @XREF:NOTE@ NOTE [<TEXT> | <NULL> ]
+1 <<NOTE_STRUCTURE>> {0:1}
REPOSITORY_RECORD:=
n @XREF:REPO@ REPO
+1 NAME <REPOSITORY_NAME> {0:1}
+1 ADDR <ADDRESS_LINE> {0:1}
+2 CONT <ADDRESS_LINE> {0:3}
SOURCE_EVENT_RECORD:=
This structure represents event-oriented information that may be used as
evidence for submitter' opinions or conclusions expressed in INDIVIDUAL and
FAMILY records.
n @XREF:EVEN@ <EVENT_TAG> <EVENT_DESCRIPTOR>
+1 DATE <DATE_VALUE> {0:1}
+1 PLAC <PLACE_VALUE> {0:1}
+1 <ROLE_TAG> <ROLE_DESCRIPTOR> {0:M}
+2 <<INDIVIDUAL>> {0:1}
+2 AGE <AGE_VALUE> {0:1}
SOURCE_RECORD:=
n @XREF:SOUR@ SOUR [<TEXT> | <NULL> ]
+1 <<SOURCE_STRUCTURE>> {0:1}
SUBMITTER_RECORD:=
The submitter record identifies individuals or organizations that
contributed the information contained within the GEDCOM transmission.
All records are assumed to be submitted by the SUBMITTER referenced in
the HEADer, unless a SUBMitter reference inside a specific record points
at a different SUBMITTER.
n @XREF:SUBM@ SUBM
+1 NAME <NAME_VALUE> {1:1}
+1 ADDR <ADDRESS_LINE> {1:1}
+2 CONT <ADDRESS_LINE> {0:3}
+1 PHON <PHONE_NUMBER> {0:1}
+1 CHAN {0:1}
+2 DATE <CHANGE_DATE> {1:1}
SUBSTRUCTURES OF THE LINEAGE-LINKED FORM
BURIAL_INFO:=
This structure is valid only if enclosed in the PLACe of a BURIal event.
n CEME <CEMETARY_NAME> {1:1}
+1 ADDR <ADDRESS_LINE> {0:1}
+2 CONT <ADDRESS_LINE> {0:3}
CHILD_FAMILY_EVENT:=
[
n ADOP <CHILD_FAMILY_EVENT_DESCRIPTOR> {1:1}
+1 DATE <DATE_VALUE> {0:1}
+1 PLAC <PLACE_VALUE> {0:1}
|
n <<LDS_CHILD_SEALING_EVENT>> {0:1}
]
INDIVIDUAL:=
n NAME <NAME_VALUE> {0:M}
n NAMR <NAME_VALUE> {0:M}
n SEX <SEX_VALUE> {0:1}
n <INDIV_EVENT_TAG> <EVENT_DESCRIPTOR> {0:M}
+1 DATE <DATE_VALUE> {0:1}
+1 PLAC <PLACE_VALUE> {0:1}
+2 <<BURIAL_INFO>> {0:1}
n TITL <TITLE> {0:M}
+1 DATE <DATE_VALUE> {0:1}
+1 PLAC <PLACE_VALUE> {0:1}
n RELI <RELIGIOUS_AFFILIATION> {0:M}
n OCCU <OCCUPATION> {0:M}
n PROP <POSSESSIONS> {0:M}
n DSCR <PHYSICAL_DESCRIPTION> {0:M}
+1 CONT <PHYSICAL_DESCRIPTION> {0:M}
n SIGN <SIGNATURE_INFO> {0:M}
n NMR <COUNT_OF_MARRIAGES> {0:1}
n NCHI <COUNT_OF_CHILDREN> {0:1}
n NATI <NATIONALITY> {0:M}
n CAST <CASTE_NAME> {0:M}
n ADDR <ADDRESS_LINE> {0:M}
+1 CONT <ADDRESS_LINE> {0:3}
+1 DATE <DATE_VALUE> {0:1}
+1 PHON <PHONE_NUMBER> {0:1}
LDS_CHILD_SEALING_EVENT:=
n SLGC {1:1}
+1 DATE <DATE_VALUE> {0:1}
+1 TEMP <TEMPLE_VALUE> {0:1}
LDS_FAM_ORDINANCE_EVENT:=
n SLGS {1:1}
+1 DATE <DATE_VALUE> {0:1}
+1 TEMP <TEMPLE_VALUE> {0:1}
LDS_INDI_ORDINANCE_EVENT:=
n <LDS_INDI_ORD> {1:1}
+1 DATE <DATE_VALUE> {0:1}
+1 TEMP <TEMPLE_VALUE> {0:1}
NOTE_STRUCTURE:=
This structure allows comments about the information originated by the
submitter.
n CONT <SUBMITTERS_TEXT> {1:M}
n NOTE @XREF:NOTE@ {0:1}
REPOSITORY_STRUCTURE:=
n NAME <REPOSITORY_NAME> {0:1}
n ADDR <ADDRESS_LINE> {0:1}
+1 CONT <ADDRESS_LINE> {0:3}
n PHON <PHONE_NUMBER> {0:1}
n CNTC <CONTACT_PERSON> {0:M}
SOURCE_STRUCTURE:=
n TYPE <SOURCE_CLASSIFICATION_CODE> {0:1}
n EVEN <EVENT_CLASSIFICATION_CODE> {0:1}
n NAME <DESCRIPTIVE_TITLE> {0:1}
n PAGE <PAGE_DESCRIPTION> {0:1}
n EDTR <EDITED_BY_NAME> {0:1}
n CPLR <COMPILED_BY_NAME> {0:1}
n XLTR <TRANSLATED_BY_NAME> {0:1}
n INFT <INFORMANTS_NAME> {0:1}
n INTV <INTERVIEWERS_NAME> {0:1}
[
n AUTH <AUTHOR_NAME> {0:M}
|
n AUTH @XREF:SUBM@ {0:1}
]
n PERI <SOURCE_TIME_PERIOD> {0:M}
n DATE <ENTRY_RECORDING_DATE> {0:1}
n PLAC <SOURCE_JURISDICTION_PLACE> {0:1}
n TEXT <SOURCE_TEXT> {0:1}
+1 CONT <SOURCE_TEXT> {0:M}
n RECO <SOURCE_RECORDER_CODE> {0:1}
n FIDE <SOURCE_FIDELITY_CODE> {0:1}
n INDX <SOURCE_INDEXED_CODE> {0:1}
n PUBL {0:1}
+1 NAME <PUBLICATION_NAME> {0:1}
+1 DATE <PUBLICATION_DATE> {0:1}
+1 PLAC <PUBLICATION_PLACE> {0:1}
+1 PUBR <PUBLISHER_NAME> {0:1}
n SERS <SERIES_VOLUME_DESCRIPTION> {0:1}
n LCCN <LIBRARY_CONGRESS_CALL_NUMBER> {0:1} (Cont.)
n QUAY <QUALITY_OF_DATA> {0:1}
n REFS @XREF:SOUR@ {0:M}
n SOUR @XREF:ANY@ {0:M}
n SUBM @XREF:SUBM@ {0:M}
[
n NOTE <SUBMITTERS_TEXT> {1:1}
+1 <<NOTE_STRUCTURE>> {0:1}
|
n NOTE @XREF:NOTE@ {0:M}
]
[
n REPO {0:1}
+1 <<REPOSITORY_STRUCTURE>> {1:1}
|
n REPO @XREF:REPO@ {0:1}
]
+1 MEDI <MEDIA_TYPE_CODE> {0:M}
+1 CALN <SOURCE_CALL_NUMBER> {0:1}
+1 REFN <MANUAL_FILING_IDENTIFICATION> {0:1}
+1 FILM <SOURCE_FILM_NUMBER> {0:1}
+2 ITEM <FILM_ITEM_IDENTIFICATION> {0:1}
[
+1 NOTE @XREF:NOTE@ {0:1}
|
+1 NOTE <SUBMITTERS_TEXT> {1:1}
+2 <<NOTE_STRUCTURE>> {0:1}
]
n STAT <SEARCH_STATUS> {0:1}
+1 DATE <SEARCH_STATUS_DATE> {1:1}
SUPPORT_INFO:=
This structure provides information pertaining to the enclosing item, i.e.,
information about the information. It may occur within any context.
[
n DATE <DATE_VALUE> {0:1}
|
n PLAC <PLACE_VALUE> {0:1}
|
n SOUR [<SOURCE_TEXT> | @XREF:SOUR@]
+1 <<SOURCE_STRUCTURE>> {0:1}
|
n NOTE [ <SUBMITTERS_TEXT> | @XREF:SOUR@ ] {0:1}
+1 <<NOTE_STRUCTURE>> {1:1}
|
n SUBM @XREF:SUBM@ {0:1}
|
n QUAY <QUALITY_OF_DATA> {0:1}
|
n CHAN {0:M}
+1 DATE <CHANGE_DATE> {1:1}
+2 TIME <TIME_VALUE> {0:1}
]
PRIMITIVE ELEMENTS OF THE LINEAGE-LINKED FORM
ADDRESS_LINE:= {Size=1:40, Type=CHARACTERS}
Address information which, when combined with a NAME and CONTinuation lines,
meets postal requirements for sending communications through the mail.
AGE_VALUE:= {Size=1:30, Type=CHARACTERS}
A number indicating the age in years, months, and/or days. Any labels must
come after their corresponding number, i.e. 4 yr 8 mo 10 da. The year is
required, listed first, even if 0 (zero).
ANCESTRAL_FILE_NUMBER:= {Size=1:8, Type=CHARACTERS}
This is a record number of an individual record contained in the Ancestral
File maintained by the Family History Department. This number simplifies the
matching process when submitting records that are intended to add additional
data or to change data for a specific record contained in the Ancestral File.
ASSOCIATION_DESCRIPTOR:= {Size=1:90, Type=CHARACTERS}
This is a word or phrase that describes the association between this
individual and the other individual in the context pointed to. (e.g., n ASSOC
great grandfather @XREF:SUBM@ would be read, this person is a great
grandfather of the individual found in submitter record.)
AUTHOR_NAME:= {Size=1:120, Type=CHARACTERS}
<NAME_VALUE>
This is the name of the person who authored or co-authored the referenced
material.
CASTE_NAME:= {Size=1:90, Type=CHARACTERS}
A name assigned to a particular group that this person was associated with,
such as a paticular racial group, religious group, or a group with an