Level2 : content model changes

Adding `<s>` and `<w>` to the current ELTeC schema in such a way as to make import of linguistic analyses easier has some implications for current content models. Here's my summary of the relevant issues to consider.

`<w>` elements can only appear within an `<s>`. At level 2, the elements **p, head,** and **l**
 should therefore change to permit as content  a sequence of `<s>` elements, intertwingled with empty elements (**gap, milestone, pb, ref**)

This leaves unclear what to do with the other sub-paragraph elements (**bibl, corr, date, emph, foreign,hi, label, measurem name, note, term, title**).

Some of these (**bibl, date, measure, term**) are really only used or needed in the header. The schema should add this constraint. 

That leaves **corr, emph, foreign, hi, label, name, titel**. The most natural (TEI-like) thing to do would be to change their content models to permit w elements. This would mean that `<w>` elements can now be found at two levels in the hierarchy which may upset some software. It also implies that `<w>` elements must be properly contained within one of these elements; this should not be an issue except possibly for `<corr>`. An alternative might be to use a trojan horse style notation, but that risks making downstream processing considerably more complicated.
 (I note in passing that `<name>` might well be used to mark the result of named entity recognition).

Currently `<quote>` is allowed all over the place and may contain just words, unwrapped in `<p>` or `<l>`. I think that should probably be disallowed.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Level2 : content model changes #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Level2 : content model changes #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions