Skip to content

Level2 : which attributes ? #25

@lb42

Description

@lb42

The TEI provides the following attributes for linguistic analysis of <w> elements:

  • lemma : provides a lemma (base form) for the word, typically uninflected and serving both as an identifier (e.g. in dictionary contexts, as a headword), and as a basis for potential inflections.
  • pos : (part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).
  • msd : (morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features).
  • join : provides information on whether the token in question is adjacent to another, and if so, on which side. The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012.

(copied from TEI P5 17.4.2 which gives some further discussion of "lightweight linguistic annotation"; see also attribute class att.linguisticfor some examples)

There are other possibilities (e.g. those provided by [ISO 12620:2009 ] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datcat.html)) but these seem closest to what I think WG2 wants to produce.

Which of them would we like to see in the schema? Which others would we like
to add?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions