-
Notifications
You must be signed in to change notification settings - Fork 9
Workflow
First Draft / open for discussion!
- Contact persons for contributions are Christof, Lou, Borja and Carolin.
- Invitation to the language collection repository: Please contact us briefly to which language(s) you would like to contribute texts. We will give your Github account write access to this repository or these repositories!
For a description of the selection criteria of ELTeC core see here.
- Check whether the text candidate fulfills the sampling criteria “Eligibility” (basic requirements).
- Decide which language collection should receive the text candidate (based on the text’s language).
- Check whether the text candidate affects the “Composition“ (balancing). A metadata collection list could be useful for monitoring the composition of the language collection, e.g. something like https://github.com/distantreading/WG1/wiki/ELTeC-List-of-Candidates.
- Check whether the license of the text candidate is compatible with the license ELTeC requires (https://github.com/distantreading/WG1/wiki/Versioning-Guidelines-for-ELTeC#licence).
- Upload your original data into the Orig folder of the language collection repository (https://github.com/COST-ELTeC).
Note: You may need to get access to the repository. Please contact: Lou, Borja, or Carolin.
Note: If your text doesn't fulfill the criteria, it could be integrated in ELTeC extension. Contact support team.
Support/help: Coordinate with the Sampling Support Team: Pieter, Diana, Lou, Borja and Carolin.
For an overview of all encoding schemas see here and for documentation of all encoding schemas of ELTeC see
- Schema level0: basic TEI Encoding here
- Schema level1: richer TEI Encoding here
- Schema level2: richer TEI Encoding with tokenization and linguistic annotation (insert link)
-
Choose whether to convert your text into ELTeC_0 or ELTeC_1. (See here for the differences) You will probably need to remove less of the existing markup for ELTeC_1. If the text you are starting from has very little or no markup, it may be easier to aim for ELTeC_0. Either is acceptable, provided that the result is valid according to one of the ELTeC schemas.
-
Check that all the metadata required is correctly specified in the TEI Header. Again, you may need to remove or comment out some existing tagging. For a summary of what is required see here.
-
Make sure your text is valid! Use a validating XML editor such as oXygen, or a command line validator such as xmllint or jing.
-
Upload your text to the Incoming folder inside the appropriate language collection. The filename should be (tba)
If your text is already in a machine-readable format, you will need to convert it.
-
Convert your file to TEI XML, for example by using oxgarage, or some other conversion program
-
Check out the sample conversions used by the ELTEC Sampler for example
-
Continue with Option A.
Support/help: For a workshop introduction to TEI and ELTeC encoding here.
Coordinate with the Encoding Support Team: Lou (tba), Carolin (carolin.odebrecht@hu-berlin.de).
Upload your text to the appropriate folder in the language repository on GitHub: https://github.com/COST-ELTeC.
Writing access to the repository is required for this! If you do not have writing access to the language repository to which you would like to contribute, please contact Lou, Borja or Carolin.
If you haven't worked with Github yet, here is a short tutorial: If you have further questions about our workflow, please contact Lou, Borja and Carolin.
See for further instructions for data, schemata and folders.
For an overview of versioning guidelines and publication strategy of ELTeC see here
- Your text contribution will be included in the ELTeC data life circle.
- We use GitHub for versioning control of each editing/preparation step.
- We iteratively publish an ELTeC language collection on Zenodo (insert link).

E5C-discussion-paper ELTeC Corpus Composition Criteria Compliance Calculations : draft for discussion
Challenges-on-text-selection Reports on challenges regarding text selection and balancing
Workflow Step-by-step introduction for contributing texts to ELTeC.
Uploading-files-on-GitHub-Step-by-Step How to upload texts on GitHub
textFeatures Table of textual features and their encodings
teiHeaders Instructions for compiling an ELTeC Header
choosingTitles Suggestions on how to select texts for ELTeC
Versioning-Guidelines-for-ELTeC Draft for defining our versioning guidelines.
Filenames and identifiers: A proposal
Please feel free to add ideas and discussion notes
Call-for-Contributions What texts can you contribute?
Example-Texts Add an example here!
ELTeC-List-of-Candidates Draft table for text candidates
Online-Text-Collections Some links to less well known collections