Skip to content

Level2 : content model changes #26

@lb42

Description

@lb42

Adding <s> and <w> to the current ELTeC schema in such a way as to make import of linguistic analyses easier has some implications for current content models. Here's my summary of the relevant issues to consider.

<w> elements can only appear within an <s>. At level 2, the elements p, head, and l
should therefore change to permit as content a sequence of <s> elements, intertwingled with empty elements (gap, milestone, pb, ref)

This leaves unclear what to do with the other sub-paragraph elements (bibl, corr, date, emph, foreign,hi, label, measurem name, note, term, title).

Some of these (bibl, date, measure, term) are really only used or needed in the header. The schema should add this constraint.

That leaves corr, emph, foreign, hi, label, name, titel. The most natural (TEI-like) thing to do would be to change their content models to permit w elements. This would mean that <w> elements can now be found at two levels in the hierarchy which may upset some software. It also implies that <w> elements must be properly contained within one of these elements; this should not be an issue except possibly for <corr>. An alternative might be to use a trojan horse style notation, but that risks making downstream processing considerably more complicated.
(I note in passing that <name> might well be used to mark the result of named entity recognition).

Currently <quote> is allowed all over the place and may contain just words, unwrapped in <p> or <l>. I think that should probably be disallowed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions