What are digital editions and digitized text corpora? This tutorial will generate a discussion about the research purposes of digital editions and the how those research purposes guide the structure and formation of digitized corpora.
NOTE: The links below are live. We recommend you right-click or command-click to open links in new windows, so you can keep this window open and available.
Some common Open Source digital corpora used to conduct research on late antiquity:
Each corpus is driven by different research questions, which dictate the contents and shape of the corpus:
- What texts are here, what texts aren’t here -- what is the corpus?
- What else besides “text” is in this corpus? What other information does it provide beyond text? Linguistics, manuscript information, print editions, translations, etc.
Full list of works at data.copticscriptorium.org/index/corpus
Visit ONE of these links:
What do you notice? Consider:
- Hover over the text with your cursor. Does anything pop up?
- Can you click on anything? What does it do?
- What else besides “text” is in this digital edition? What other information can I find?
Some explanations: * The "Normalized" button will give you a digital edition of the normalized text. * The "Analytic" button will show you an edition of an aligned normalized text, English translation (when available), and part of speech tagging for each word. * The "Diplomatic" button will provide a digital edition of a diplomatic manuscript transcription
- If you're looking at any of these digital editions, scroll down to see the document's metadata (or information about the document).
After you've played around, here's a review of key features:
- Normalized editions: English translation (if available) pops up on hover; words linked to the online Coptic Dictionary ("Chapter view" for biblical books.)
- Diplomatic editions: manuscript page number pops up on hover
- Analytic editions are aligned Coptic/English (if available)
- You can filter different features and information about the text using the menu.
- All a document's metadata is underneath the edition.
- You can search for a string of characters on any document page using the usual command-f command on your computer
All the editions you see are visualizations generated from text that has been encoded and annotated according to disciplinary standards. The project releases digitized and annotated text in these formats:
- Text Encoding Initiative Extensible Markup Language (TEI-XML) files
- PAULA XML files
- The online installation of the files in the search and visualization tool (or database) we use (called ANNIS)
- The raw files used in the ANNIS installation (relANNIS files)
If you want to cite a document or visualization in a publication, see the Citation information after each document's metadata. Also, see our Citation Guidelines page. Always make a note of your document URN and relevant metadata, especially the version number and date of the document. You might also want to save the visualization for your own records by saving the webpage or printing to pdf. All the editions you see are visualizations generated from text that has been encoded and annotated according to disciplinary standards. The project releases digitized and annotated text in these formats:
The following buttons will take you to those data files:
Perseus and Papyri.info also use TEI XML encoding for their digital editions. Coptic Scriptorium is a multi-disciplinary project; we also release our corpora in PAULA XML files, since the PAULA format is used for linguistic research.
If you're creating your own digital corpus, Consider:
- What kinds of things do you want to digitize and why?
- What makes that a "corpus"?
- Do you need to annotate your text in any way? In our corpus architecture, even spelling normalization is an annotation. What annotations do you need? Why and how?
- What kind of metadata do you need to provide?
- What kind of access will you allow others to have to your corpus? (Consider a license for your corpus and data.)
If you're interested in aligning texts and translations, here are some tools:
- Alpheios Text Alignment App (allows you to export XML files of your alignment; created in collaboration with Perseus)
- Text Alignment Network
- Ugarit text alignment
If you're interested in Coptic New Testament and Old Testament editions, check out these projects in Germany:
- New Testament Virtual Manuscript Room which includes Coptic
- Coptic Old Testament Project