Early Americas Digital Archive
EADA Home Introduction to the Acrhive Browse/Search the Archive Contact Us News

EADA Project Documentation Description

General Overview

The following is a general overview of procedures used throughout this project. Descriptions of each element is described in the Element Descriptions.

The texts in this project are encoded with Extensible Markup Language (XML) to facilitate rendering and searching the digital text. For this reason, it is essential to mark in the digital text both the structure and the appearance of the original source. For example, if the structure of the original source comprises mutliple subdivisions, these are nested in the digital text like the subsections of an outline might be nested inside levels of headings. Likewise, if a portion of the original text appears in italics or underlined, these textual treatments are noted in the markup of the digital text.

Essentially, the more the digital text is encoded to indicate the structure and appearance of the original source, the more options the end-user has in searching and viewing the bibliographic codes of the resulting digital text.

I. Original Source

Each digital document should be prepared from a specific edition which will be mentioned in the header of the digital text. If appropriate, the digital text can be proofed against an additional edition.

Omissions

The following portions of the original source are omitted from this project:

  1. All preliminaries, such as front matter and title pages, except in those cases when the preliminaries are considered authorial artifacts.
  2. Editorial comments except those for which the author might be responsible and those in which significant textual variation is indicated.
  3. Catchwords
  4. Page breaks

II.Text Encoding Initiative (TEI) and the Document Type Definition (DTD)

Since 1987, the Text Encoding Initiative (TEI) has provided mark-up standards that help scholars to encode various types of literary texts for online research and teaching. To mark-up or encode in this sense means making an interpretation of a text explicit through metalanguage. Unlike HTML, the metalanguage of XML is focussed on the meaning of data, not its presentation. As TEI explains it, "With descriptive instead of procedural markup the same document can readily be processed in many different ways, using only those parts of it which are considered relevant." See TEI Consortium for more information.

For example, in HTML, the presentation of a title is:
<center>The Title</center>

in XML the meaning is encoded:
<title>The Title</title>.

The Document Type Definition (DTD) used for this project is a subset of the TEI. The DTD is used to define the legal elements of the XML document. The XML document should be parsed against the DTD before submission to EADA, as the digital document must be formed according to the parameters established by the DTD before the XML document can be included in the EADA database.

For example, if you are using a text editor like XMetal, you will be required to provide a "rule" document or the DTD.

II. Electronic Edition

Naming Convention

Each Electronic Edition is named in all lowercase with the source author's last name and the first word or words of the title that make that title unique. It is important that each title be unique to avoid file duplication and overwriting.

For example, Anne Bradstreet's "To My Dear Children" and "To My Dear Husband" would be named "bradstreettochildren.xml" and "bradstreettohusband.xml"

back to the top

TEI Header

The TEI-Header for each document is structured according to pre-established guidelines. Each header comprises significant metadata that will help identify and categorize each document, including publication information and particular editorial decisions.

A Web header example and an XML header template for downloading are provided as examples. (Note: to view the XML header template, please right-click and save the file as "EADAHeader.xml".

Document Division Structures <div>

Each document has a basic division of <div0>. Each internal division is marked with appropriate subsections (i.e., <div1>, <div2>, and so on.) Subsections may include numbered sections or chapters or whole poems depending on the structure of the original source. Often, the subsections can be identified by the presence of distinct titles or headings.

<text> <body>
<div0>
<head type="main" rend="all-caps">[Head Text inserted here]</head>

<div1>
<head>[Head Text for second <div1>]</head>
<p n="1">[First Paragraph of <div1>]</p>
</div1>

<div1>
<head>[Head Text for second <div1>]</head>
   <div2>
   <head>[Head Text for <div2>]</head>
      <div3>
      <head>[Head Text for <div3>]</head>
      <p n="22">[First Paragraph of <div3>]</p>
      <p n="23">[Second paragraph of <div3>]</p>
      </div3>
   </div2>
</div1>

<div1>
<head>[Head Text for second <div1>]</head>
   <div2>
   <head>[Head Text for <div2>]</head>
   <p n="24">[First paragraph of <div2>]</p>
   </div2>
</div1>

</div0>
</body></text>

back to the top

Special Characters

Characters that are non-keyboard (that is, do not appear on the main letter and number keys on the computer keyboard) are encoded with special unicode references. Special characters are captured in the text with standardized numerical character references. These values can be found at http://www.unicode.org/charts/.

For example, the m-dash in the following lines

count my vain sighs for nought?
–For such is joy and such the price of pain.

would be encoded as,

<l>count my vain sighs for nought? </l>
<l>&#x2014;For such is joy and such the price of pain.</l>

back to the top

Typographically Distinc Text

All changes to the typography of the text (i.e., font changes) are encoded to facilitate bibliographic code rendering.

Font changes (e.g., titles, foreign, and emphasized words) are recorded in the 'rend' attribute of that element. The following are possible values for the rend attribute:

  1. rend="italic"
  2. rend="bold"
  3. rend="underline"
  4. rend="strike-out"
  5. rend="superscript"
  6. rend="subscript"
  7. rend="small-caps"

For example, a title that appears in italics would be tagged as <title rend="italics">Tarry shadow of my scornful treasure</title>.

back to the top

Numbering

All paragraphs <p> are numbered sequentially through the entire text.

Line <l> and line group <lg> numbering is started again for each new set of <lg>s within a text. That is, if the whole text is one poem or song, etc., all the line groups and lines are numbered sequentially throughout the entire text. If the text contains several poems or songs, etc., each separate group of line-groups and lines is numbered sequentially wihin that group, because each group would be numbered as a separate <div#>. In this case, line numbering begins again for each group or <div>.

Please note: If the source is numbered, the electronic file reflects that system regardless of source errors. Otherwise, numbers are entered by the editor sequentially throughout the document.

The example below represents a long poem.

<lg n="56">
. . .

<l n="325">And to conclude, I may not tedious be,</l>
<l n="326">Man at his best estate is vanity.</l>
</lg>

<lg n="57">
<head rend="italic">Old Age.</head>
<l n="327">WHAT you have been, ev'n such have I before:</l>
<l n="328">And all you say, say I, and something more.</l>
. . .
</lg>

back to the top

Punctuation

Punctuation is usually recorded according to the source. While punctuation appears within the element for larger structures like <p> and <l>, most often, punctuation appears outside of a tagged string whenever appropriate.

In the example below, it is appropriate to include the punctuation within the <l> tags as the punctuation is part of the line of verse, but it is not necessary to include the punctuation within the <hi> tags as the comma is not italicized in the text.

<l n="29">By fraud or force usurp'd thy flowring crown,</l>
<l n="30">Or by tempestuous warrs thy fields trod down?</l>
<l n="31">Or hath <hi rend="italic">Canutus</hi>, that brave valiant <hi rend="italic">Dane</hi></l>

back to the top

Spacing

Spacing is limited to one space between characters. Spacing should always appear outside of a tagged string whenever possible.

In the example below, spacing has been extracted as the tags note the structural difference between the various strings.

<closer><salute>My Lord <lb/>Your Lordship most <lb/>Humble Servant,</salute><signed>George Alsop</signed></closer>

In the following example, strings that are tagged within sentences are tagged without spacing. The spacing is left around the tagged string.

<l n="358">That but for shrubs they did themselves account.</l>
<l n="359">Then saw I <hi rend="italic">France</hi> and <hi rend="italic">Holland</hi>, sav'd <hi rend="italic">Cales</hi> won,</l>
<l n="360">And <hi rend="italic">Philip</hi> and <hi rend="italic">Albertus</hi> half undone.</l>

back to the top

Any questions about this description should be sent to Tanya Clement, MITH Program Associate, at tclement@wam.umd.edu.