Scripts and utilities for transforming and preparing US* schemas for use in Platform.Bible and Paratext
To get started with this repo, first clone the repo:
git clone https://github.com/paranext/usfm-tools.git
Then install dependencies:
npm i
In this repo, there are some terms used that are rather precise and specific to USFM and others that do not necessarily match with official terminology (including some newly coined terms where there are not known terms). There is lots of documentation in src/markers-map.model.template.ts. Here is a short list of terms and definitions for clarity:
- Attributes - key/value pairs (or, in some cases in USFM, just values because the key is implied via USFM syntax) on a marker that provide some information about that marker
- USFM attributes - key/value pairs or values whose keys are implied that add information to markers but are not essential information about how to represent a marker in USFM. For example,
caller,lemma, andcodeare USFM attributes, butstyle(the marker name in USX) andclosed(a USX attribute and USJ property that indicates whether the marker has a closing marker in USFM) are not USFM attributes but are only found in other formats and are used to represent the marker itself in USFM. - USX/XML attributes - XML attributes on an XML element. Some of these (like
callerorlemma) are USFM attributes while others (likestyleorclosed) are other kinds of information that are not represented as attributes in USFM - USJ/JSON properties/attributes - a term sometimes used for each key/value pair in JSON objects. Some of these (like
callerorlemma) are USX attributes and USFM attributes. Others (likestylein USX, which equates tomarkerin USJ, andclosed) are USX attributes but are not USFM attributes. Others (likecontent) are other kinds of information that are not represented as attributes in USFM or USX.
- USFM attributes - key/value pairs or values whose keys are implied that add information to markers but are not essential information about how to represent a marker in USFM. For example,
usx.rngattribute- XML elements with tag nameattributeinusx.rng. These are not like the attributes described above because these are XML elements in RelaxNG specification that describe USX attributes.- USFM Attribute types - different ways attributes are represented in USFM. Each USFM attribute has its own attribute type, and these types do not apply to USX or USJ (they are all normal USX attributes and USJ properties).
- Closing marker attributes - attributes that are listed attached to the closing marker e.g.
lemmaonwmarker. These look like\marker content|attributeKey="attributeValue" otherAttributeKey="otherAttributeValue"\marker*- Default attribute - an attribute that, if it is the only closing marker attribute for a marker, can be listed without the attribute key e.g.
glossonrb. These look like\marker content|defaultAttributeValue\marker*
- Default attribute - an attribute that, if it is the only closing marker attribute for a marker, can be listed without the attribute key e.g.
- Special attribute types - attributes in USX/USJ that are not just listed on the closing marker but are represented in some other way in USFM. None of these have the attribute key listed in USFM.
- Attribute marker - separate markers that appear after the marker they describe in USFM e.g.
altnumberoncisca. These look like\marker content \attributeMarker attributeValue - Text content attribute - the actual text content of the marker in USFM. e.g.
altonperiph. These look like\marker content. - Leading attribute - text that is added right after the opening marker and before the text content of the marker e.g.
calleronf. These look like\marker leadingAttribute content
- Attribute marker - separate markers that appear after the marker they describe in USFM e.g.
- Closing marker attributes - attributes that are listed attached to the closing marker e.g.
usx.rngelement- XML elements with tag nameelementinusx.rng. These XML elements are from RelaxNG specification and are used to describe markers and/or marker types.usx.rngdefine- XML elements with tag namedefineinusx.rng. These XML elements are from RelaxNG specification and are used to describe some set of information that is referred to somewhere else inusx.rng. Usually, thesedefines contain one or moreelements, so they describe one or more markers and/or marker types.- Closing marker - the USFM representation of the end of a marker. USX and USJ markers just use their equivalent XML/JSON syntax for the closing of an element/object.
- Normal closing marker - a closing marker that uses the same marker name as the opening marker and just have an asterisk at the end e.g.
\nd*fornd. These look like\marker*. - Independent closing marker - a closing marker that uses a different marker name than the opening marker and does not have an asterisk added at the end e.g.
\esbeforesb. These look like\closingMarker.
- Normal closing marker - a closing marker that uses the same marker name as the opening marker and just have an asterisk at the end e.g.
- Specification/spec - the official ruling about how USFM/USX/USJ should look. This is found at https://docs.usfm.bible/usfm/3.1/index.html
- Whitespace
- Structural whitespace - whitespace in USFM that is required part of the USFM syntax and delimits different things e.g. normal space after opening markers. This looks like
\marker content - Content whitespace - whitespace in USFM that is part of the actual Scripture text or the "content" of the marker. This looks like
\marker here is some content with content whitespace in it - Normalization - the process of transforming USFM with any whitespace into USFM with specific whitespace based on a set of rules. Many different USFM representations of the same Scripture content should be able to be normalized into the same USFM string. Paratext has its own rules for normalizing whitespace, and the specification has its own rules that result in the canonical form.
- Canonical form - the official representation of how USFM should look based on the rules described by the specification. The whitespace should be normalized or "reduced" according to the rules in the specification.
- Structural whitespace - whitespace in USFM that is required part of the USFM syntax and delimits different things e.g. normal space after opening markers. This looks like
The markers map is a JSON file that contains information for each USFM marker and marker type. It aims to include all necessary marker-specific information for translating from USJ to USFM that is not about the generic syntax of USFM.
See UsjReaderWriter for an example of using this markers map to transform USJ to USFM as well as to convert locations between USFM and USJ space.
The markers map does not contain the following information necessary for perfectly transforming USJ to USFM:
- There are a few properties on markers that should not be output to USFM as USFM attributes but rather should be incorporated in other ways. The use of these properties is partially or wholly not represented in the markers map
style/marker(the marker name in USX/USJ)- the XML element tag/
type(the marker type in USX/USJ) - the XML element children/
content(the contents of the marker in USX/USJ) closed(whether the marker should be explicitly closed in USFM)- Note: In USFM 3.1, the
+prefix for nested character markers is optional, but the markers map does not currently expect or have instructions on how to handle any special information to preserve whether or not this prefix is present.
- The
vmarker (versetype) canonically has a newline before it. However, Paratext 9.4 does not add a newline before it if it comes after(or[. - The
optbreakmarker is transformed to two slashes in a row//in USFM - Non-breaking space (
NBSP/U+00A0) should be converted to~in USFM - General, simple rules about how canonical USFM is structured. Some examples:
- There is a backslash before each marker name (except in certain circumstances when indicated by
markerType.isClosingMarkerEmpty) and a space after each marker name (except in non-standard circumstances when indicated bymarkerType.noSpaceAfterOpening) e.g.\nd - There is an asterisk before normal closing markers e.g.
\nd* - Newlines before markers as indicated by
markerType.hasNewlineBeforereplace space after the last content before the marker - Attributes that are not special attribute types or skipped are listed at the end of the marker after a bar
|in the formkey="value"with spaces between multiple attributes. - See
UsjReaderWriter.toUsfmto find the implementation of all general USFM rules.
- There is a backslash before each marker name (except in certain circumstances when indicated by
- The spec seems to be silent regarding what should happen to unknown markers. In Paratext 9.4, markers whose type is
parabut the marker is unknown (meaning the marker info cannot be found or the markertypein the marker does not match the markertypelisted in the marker info) do not have a newline before them when output to USFM contrary to normalpara-type markers. - The spec seems to be silent about unexpected closing markers. In Paratext 9.4, closing markers that have no matching opening marker are given the
unmatchedmarker type. They have no contents. no closing markers, and no structural space after the marker.
The markers map also includes most information necessary for parsing USFM and translating from USFM to USJ, but it does not currently cover this use case or aim to cover it. Particularly, it does not contain the following information (there may be other gaps):
- When to close USFM markers
- Where to create the
tablemarker that is currently derived in USX and USJ but is never in USFM - Which whitespace is USFM structural whitespace that has no representation in USX/USJ and can be skipped
- When two slashes in a row
//are found, this should be converted to theoptbreakmarker in USX/USJ - When
~is found, this should be converted to non-breaking space (NBSP/U+00A0) in USX/USJ. - What to do about unknown markers (ones for which there is no marker info). Paratext 9.4 gives them the type
para. - What to do about unexpected closing markers (end with
*). Paratext 9.4, closing markers that have no matching opening marker are given theunmatchedmarker type, have no contents, no closing markers, and no structural space after the marker.
Generate the markers map by placing the USX RelaxNG Schema file usx.rng (download the file on a release branch - usx.rng < 3.1 or usx.rng >= 3.1) in the root of this repo and running npm run generate-markers-map -- --schema usx.rng --version <schema-version> --commit <commit-hash>. Note that the commit hash is the commit hash for the repo where you got usx.rng, not the commit hash of this repo.
See the release notes and planned changes for USFM versions in the Roadmap and in the Docs.
This script reads the USX RelaxNG Schema file usx.rng and generates a JSON file dist/markers.json and a TypeScript file dist/markers-map.model.ts that contain various information for each USFM marker name. markers.json will contain an object with:
- information about the generated file (
version,commit,usfmToolsVersion) - the Semantic version of the markers map
markersMapVersion. The same major version contains no breaking changes - a
markersproperty whose value is a map object- keys are the marker names
- values are objects containing information about the marker such as the marker type and the marker's default attribute (where applicable)
- a
markersRegExpproperty whose value is the same thing asmarkersbut for markers whose names match the keys using RegExp - a
markerTypesproperty whose value is a map object- keys are the marker types
- values are objects that are currently empty but may be filled with information about the marker types in the future
- other properties that slightly affect how the USJ is transformed to USFM that are different depending on what style of USFM you intend to generate, spec or Paratext 9.4 (
isSpaceAfterAttributeMarkersContent,shouldOptionalClosingMarkersBePresent).
This object is also exported from dist/markers-map.model.ts as USFM_MARKERS_MAP (matching spec) and USFM_MARKERS_MAP_PARATEXT (matching Paratext 9.4). dist/markers-map.model.ts also contains TypeScript types relevant to this object.
Following is a simplified example of what you might see in a markers.json file:
{
"version": "5.2-test.123",
"commit": "abc123",
"markers": {
"c": {
"type": "chapter",
"leadingAttributes": [
"number"
],
"attributeMarkers": [
"ca",
"cp"
]
},
"p": {
"type": "para",
"description": "Paragraph text, with first line indent"
},
"qt3-s": {
"type": "ms",
"defaultAttribute": "who"
},
...
},
"markersRegExp": {
"t[hc][rc]?\d+(-\d+)?": {
"type": "cell"
}
},
"markerTypes": {
"cell": {
"skipOutputAttributeToUsfm": [
"align"
]
},
"chapter": {
"hasNewlineBefore": true,
"skipOutputAttributeToUsfm": [
"sid"
],
"skipOutputMarkerToUsfmIfAttributeIsPresent": [
"eid"
]
},
"ms": {
"isCloseable": true,
"isClosingMarkerEmpty": true
},
"para": {
"hasNewlineBefore": true,
"skipOutputAttributeToUsfm": [
"vid"
]
},
}
}Expand to read about how the data in `usx.rng` is transformed into `markers.json`
The marker names and information about those markers are mostly derived from the usx.rng file. This schema file contains information about each valid USFM marker in the various element definitions:
- (
marker.type;markerTypeskeys) The element'snameis the marker type - Skip the definition if all
refs pointing to it are pointing to it viausfm:altattribute instead ofname(FigureTwo) - Marker information:
- (
markerskeys;markersRegExpkeys) The marker name comes from one of a number of places:- The
styleattribute may contain the single marker name for that marker type - The
styleattribute may contain achoiceof all the marker names associated with that marker type - The
styleattribute may contain arefpointing to achoiceof all the marker names associated with that marker type - If there is not a
styleattribute, the element'snameis the marker type and the marker name
- The
- (
marker.isIndependentClosingMarkerFor;marker.independentClosingMarkers) additional independent closing marker that goes with another marker- Check for marker type element direct children
usfm:ptagorusfm:tagwith text content and create a simple marker (no attributes or whatnot from the other markers of this marker type) whose name is the text content of the tag. Likeesbeinsidebarmarker type
- Check for marker type element direct children
- (
marker.isClosingMarkerOptional) closing marker should not usually be output to USFM if theusfm:endtaghasnoout="true" - (
marker.description) get comments of what the marker represents froma:documentationright after thestyleattribute or from an XML comment right after thestyleattribute - Lots of attribute info comes from various sources:
- Gather list of all attributes
- Get
attributetags in theelementtag - Look in
reftags inelementand check ifdefinehas first childattributeoroptionalthenattribute(category,closed,link-href,link-title,link-id) - Do not consider the
styleattribute as a normal attribute as it is the marker name rather than a USFM attribute - Do not consider the
closedattribute as a normal attribute as it is a special attribute that is never output to USFM - Do not consider
colspanattribute oncellas a normal attribute as it is incorporated into the marker name and is not a USFM attribute
- Get
- There are many kinds of special attribute types in USFM representation. One attribute cannot be multiple types of special attribute. Check if an attribute is a special type in this listed order:
- Attributes should not be considered for being a special attribute type in any of the following circumstances:
- the
attributetag has multipleusfm:matchtags - name is
stylesince that attribute is always the marker name in USFM - the attribute is listed in
markerType.skipOutputAttributeToUsfmbecause these special attribute types are related to USFM output - the attribute is listed in
markerType.skipOutputMarkerToUsfmIfAttributeIsPresentbecause these special attribute types are related to USFM output - The
attributehas anyusfm:matchwithbeforeoutcontaining|<attribute-name>=. This is here to preventidonperiphfrom being default because it is an unusual USFM marker that doesn't have a default even though it has an attribute
- the
- (
marker.attributeMarkerAttributeName) attribute markers - e.g.altnumber/ca,pubnumber/cp,altnumber/va,pubnumber/vp,category/cat- One
usfm:matchorusfm:tagorusfm:ptagwithbeforeout\\__- [Special case]
versiononusxis not an attribute marker (this special case may be unnecessary if the generation script is improved to handle markers that are not directly represented in USFM)
- [Special case]
- (
markerskeys;markersRegExpkeys) get marker name frombeforeout - (
marker.hasStructuralSpaceAfterCloseAttributeMarker)afteroutwill have a space after the marker name like\\__if there should be a space in the canonical output USFM - (
marker.type)paraifusfm:ptagorbeforeouthas\n;charotherwise - (
marker.isAttributeMarkerFor/marker.attributeMarkers) record the connection between the marker this attribute marker is listed on and this attribute marker
- One
- (
marker.textContentAttribute) text content attribute - e.g.periph'salt- One
usfm:matchwithmatch="TEXTNOTATTRIB"ormatch="TEXTNWS"- [Special case]
usxmarkerversionis text content (it hasmatch="TEXTNWS"in one of two occurrences; probably should be on both. Probably needs some kind of special marking indicatingusxmarker is replaced byusfmmarker)
- [Special case]
- One
- (
marker.leadingAttributes) Leading attributes - e.g.v'snumber- One
usfm:matchis presentmatchmust not beTEXTNOTATTRIBorTEXTNOTATTRIBOPTbeforeoutmust not contain\\__
- One
- (
marker.defaultAttribute) If the marker has a default attribute, it may come from one of two places- The default attribute will be the value of the
usfm:propvalattribute on thevaluetag in thestyleattribute or in the enumeration. - If there is no
usfm:propvalattribute on thevaluetag in thestyleattribute or there is nostyleattribute, the default attribute for a marker will be the first non-optionalattributenamelisted in the element other than the attributes to skip or the first optional non-skippedattributenameif there are no non-optional non-skippedattributes. There is only a default attribute if there are zero or one non-optional non-skippedattributes.- Attributes should be skipped when determining which attribute is the default attribute via normal rules of these instructions for attributes (meaning they are not in the list of attributes that should not be considered and are not other special attribute types like leading attributes)
- [Special case] In less than 3.1, do not consider
link-href,link-title, orlink-idfor default attribute because these attributes are common linking attributes that can be on many markers but are only default onjmpandxt(but they are not marked differently on those, so this must be hard-coded)
- The default attribute will be the value of the
- Attributes should not be considered for being a special attribute type in any of the following circumstances:
- Gather list of all attributes
- (
- Marker type information:
- (
markerType.hasStyleAttribute) note when the marker shouldn't have astyleattribute- If the element has no
styleattribute, the marker shouldn't either.- Do not consider the marker type to have no
styleattribute if allrefs pointing to it haveusfm:ignore="true", meaning it is just listing attributes that indicate the whole marker should not be output to USFM
- Do not consider the marker type to have no
- If the element has no
- (
markerType.skipOutputAttributeToUsfm) Do not output an attribute to USFM if:attributehasusfm:ignore="true"(attribute- chapter and versesid,closed)attributenamehasns="http://www.w3.org/2001/XMLSchema-instance"on it or name starts withxsi:(these attributes are not related to Scripture data and should not be exported to USFM)- the attribute is
vidonparaortable(probably should haveusfm:ignoreset) - the attribute is
sidinchapter(probably should haveusfm:ignoreset) - [Special case] the attribute is
alignorcolspanattributes incellmarker typealign(probably should haveusfm:ignoreset because it is already embedded in the style)colspanprobably needs some kind of special something set because it gets embedded in the style for USFM but is not present in the style already in USX/USJ
- (
markerType.skipOutputMarkerToUsfmIfAttributeIsPresent) Ignore the opening and closing markers when translating to usfm (but keep the contents of the marker) ifattributes listed in themarkerTypeare present if any of the following are true:- If all
refs pointing to thedefinehaveusfm:ignore="true"(chapter and verseeid) - If any
usfm:matchin the attribute hasnoout="true"attribute on it (refgen)
- If all
- (
markerType.hasNewlineBefore) marker type should have newlines before the marker if- In
styleattribute element (or, if there is nostyleelement, in theelementelement), oneusfm:ptagorusfm:tagorusfm:matchdirect child withbeforeoutwith\nin it (verse-\nis optional, whereas it does not seem to be optional in the others. Does this matter for us? I don't think so; I think it all normalizes out to being just whitespace).- [Special case]
cellhasusfm:ptagbut should not have a newline before it. TJ thinks is a bug inusx.rng. - [Special case]
periphdoesn't have\nin itsusfm:matchbeforeout, but it should have a newline before it. TJ thinks is a bug inusx.rng. - [Special case]
usxdoesn't have\nin itsusfm:matchbeforeout, but it should have a newline before it. TJ thinks this is a bug inusx.rng.
- [Special case]
- In
- (
markerType.isCloseable) the marker type has a normal closing marker if- One
usfm:endtagis present somewhere in the element- If there are two that share the same attributes other than
matchrefandbeforebeing the same other than a+in one, can consider just the first one. This is for somecharmarkers that have both\ndand\+ndlisted usfm:endtagis outside theelementformilestonebecause itselementhas<empty/>in itrefshould have closing marker.usfm:endtagis outside the element for some reason.
- If there are two that share the same attributes other than
- (
markerType.isClosingMarkerEmpty) Closing marker is empty ifmatchref="''"(which basically means empty - there is very intentionally nothing to match)- Note:
ref'susfm:endtaghasmatchref="", and it should have a closing marker - Note:
categoryhasmatchref=""andmatchoutis not empty/not provided (category). If we end up handlingcategorymore precisely, this might need to be considered.
- Note:
- One
- (
Following are some improvements that could potentially be made to further strengthen this markers map generation:
- Do some work to encode that the
usx,usfm, andUSJmarkers are different in each standard - Should all the special attribute stuff be on
markerTypeinstead? Some risk in thatcatis a marker attribute on allnotemarker types, but maybe that's coincidence and it may not forever and always be on allnotemarker types - Explain how the terms I am using from XML sorta map to the USFM concepts but aren't exact one-to-one equals
- [markerType] note when the marker shouldn't have a
styleattribute- Improve accuracy: if the
elementhas nostyleattribute and has direct childusfm:tag(ref),usfm:ptag(none -sidebaris closest), orusfm:match(periphandoptbreak), nostyleattribute. If doesn't have one of these direct children (table,usx), the marker shouldn't be output to USFM at all. Or at least it indicates a very special case. Maybe not handling this yet is whyusxconsidersusfmto be a marker attribute in theusx.rngbut we don't. Andtable usxdoesn't haveusfm:tagorusfm:ptagand its attribute hasbeforeoutwith\\__. Could use those two indicators to determine it should be replaced withusfmin output. But then this still doesn't cover movingusfmunderid
- Improve accuracy: if the
- [markerType] Figure out how to determine when to close these long-running markers with their own content hierarchies -
usx,table,periph,esb, others? Actually probably need a general way to represent how any marker closes, not just these specific ones close - Do we need to keep track of whether a nested marker that closes has
+on its markers? clandesbeboth specifyafterout="'\n'"meaning a newline after them. But it seems to get reduced with newlines that come before the stuff after, so I dunno if we really need this. Maybe test P9 putting stuff after these markers and see what happensbookmarker type also has ausfm:matchin it withmatchout="'\n'". Thinking this indicates it is a block-level marker, but it's weird because this may be the only one like this. All other block-level markers haveusfm:ptag. Butidis always the first line of the file. How should we track this?- Actually, it seems
hasNewlineBeforedoesn't line up with block-level marker types forperiphorverse(optional newline) either. Maybe block-level should be its own property on marker types.periphis not quite a block-level marker type, actually; more like a multi-block type. Need to define some rules around when these can end.periph,table,usx,esb(has its own closing marker). Can provide attributes in USFM with inline syntax, not block-level syntax.
- If needed, can tell if marker type doesn't have text content via
<empty/>- Probably doesn't matter for our needs because, if a marker is empty, it won't have
contents. You can tell if there should be a closing marker (like milestones) from other things.
- Probably doesn't matter for our needs because, if a marker is empty, it won't have
The usx.rng file does not contain every single piece of information necessary for performing the supported operations with the markers map (like transforming USJ to USFM). Following are some special additions and exceptions to the rules for determining the markers map from the usx.rng file that are manually encoded into the markers map to ensure its completeness. Note that not all exceptions are necessarily listed here; you can find exceptions by looking for special case: in src/markers-map.util.ts.
- All rules starting with [Special case] in the sections above
- There are some markers that need very special handling that is not represented perfectly in
usx.rng. Inmarkers.json, the special handling is explained inparseUsfmInstructionsandoutputToUsfmInstructions:usfmwith marker typeparaand no default attribute. This marker is present in USFM but most of the time is translated into theusxmarker in USX and theUSJmarker in USJ- Note that
usfmis a specialparain that its text content is considered to beversion, which gets translated tousxandUSJas an attribute.
- Note that
USJwith marker typeUSJand no default attribute. This marker is present in USJ but is translated into theusxmarker in USX and theusfmmarker in USFM.cell-type markers encode the number of columns they span differently between USFM and USX/USJ
Note: fig has an attribute that changes names: in USFM, it is src; in USX and USJ, it is file.
Following is a snippet from the schema that is an example of one marker name and marker type:
<define name="PeripheralBookIdentification">
<element>
<name ns="">book</name>
<attribute>
<usfm:tag before="/?${anyws}*\\/" beforeout="'\\'" usfm:seq="true"/>
<name ns="">style</name>
<value>id</value>
</attribute>
<attribute>
<usfm:match/>
<name ns="">code</name>
<ref name="PeripheralBookIdentification.book.code.enum"/>
</attribute>
<group usfm:seq="true">
<optional>
<group>
<usfm:match before="/${hs}*/" beforeout="' '" match="/[^\\\n\r]*/"/>
<text/>
</group>
</optional>
<usfm:match match="NL" matchout="'\n'" dump="true"/>
</group>
</element>
</define>Generating the marker map from only this snippet would result in the following:
{
"markers": {
"id": {
"type": "book"
}
}
}Following is a snippet from the schema that is an example of many marker names in a choice that share a marker type:
<define name="Footnote">
<element name="note">
<attribute name="style">
<choice>
<value>f</value>
<value>fe</value>
<value>ef</value>
</choice>
</attribute>
<attribute name="caller"/>
<optional>
<attribute name="category"/>
</optional>
<oneOrMore>
<choice>
<ref name="FootnoteChar"/>
<text/>
</choice>
</oneOrMore>
</element>
</define>Generating the marker map from only this snippet would result in the following:
{
"markers": {
"f": {
"type": "note"
},
"fe": {
"type": "note"
},
"ef": {
"type": "note"
}
}
}Following is a snippet from the schema that is an example of many marker names in a choice in a ref that share a marker type:
<define name="BookTitles">
<element>
<name ns="">para</name>
<attribute>
<usfm:ptag/>
<name ns="">style</name>
<ref name="Title.para.style.enum"/>
</attribute>
<zeroOrMore>
<choice>
<text>
<usfm:text/>
</text>
<ref name="Footnote"/>
<ref name="CrossReference"/>
<ref name="Char"/>
<ref name="Break"/>
</choice>
</zeroOrMore>
</element>
</define>
<define name="Title.para.style.enum">
<choice>
<value>mt1</value>
<a:documentation>The main title of the book (if multiple levels)</a:documentation>
<value>mt2</value>
<a:documentation>A secondary title usually occurring before the main title</a:documentation>
<value>mt3</value>
<a:documentation>A tertiary title occurring after the main title</a:documentation>
<value>mt4</value>
<value>mt</value>
<a:documentation>The main title of the book (if single level)</a:documentation>
<value>rem</value>
<a:documentation>Remark</a:documentation>
</choice>
</define>Generating the marker map from only this snippet would result in the following:
{
"markers": {
"mt1": {
"type": "para"
},
"mt2": {
"type": "para"
},
"mt3": {
"type": "para"
},
"mt4": {
"type": "para"
},
"mt": {
"type": "para"
},
"rem": {
"type": "para"
}
}
}Following is a partial snippet from the schema that is an example of many marker names, some with default attributes, that share a marker type:
<define name="Milestone">
<group>
<element>
<name ns="">ms</name>
<attribute>
<usfm:tag after="Hs" afterout=""/>
<name ns="">style</name>
<ref name="Milestone.style.enum"/>
</attribute>
<optional>
<ref name="Attributes"/>
</optional>
<empty/>
</element>
<usfm:endtag matchref="''"/>
</group>
</define>
<define name="Milestone.style.enum">
<choice>
<value usfm:propval="sid" usfm:propattribs="sid?" usfm:propended="ts-e">ts-s</value>
<value usfm:propval="eid" usfm:propattribs="eid?" usfm:propends="ts-s">ts-e</value>
<value>ts</value>
<value usfm:propval="sid" usfm:propattribs="sid?" usfm:propended="t-e">t-s</value>
<value usfm:propval="eid" usfm:propattribs="eid?" usfm:propends="t-s">t-e</value>
<value usfm:propval="who" usfm:propattribs="who? sid?" usfm:propended="qt1-e">qt1-s</value>
<value usfm:propval="eid" usfm:propattribs="eid?" usfm:propends="qt1-s">qt1-e</value>
<value usfm:propval="who" usfm:propattribs="who? sid?" usfm:propended="qt2-e">qt2-s</value>
<value usfm:propval="eid" usfm:propattribs="eid?" usfm:propends="qt2-s">qt2-e</value>
</choice>
</define>Generating the marker map from only this snippet would result in the following:
{
"markers": {
"ts-s": {
"type": "ms",
"defaultAttribute": "sid"
},
"ts-e": {
"type": "ms",
"defaultAttribute": "eid"
},
"ts": {
"type": "ms"
},
"t-s": {
"type": "ms",
"defaultAttribute": "sid"
},
"t-e": {
"type": "ms",
"defaultAttribute": "eid"
},
"qt1-s": {
"type": "ms",
"defaultAttribute": "who"
},
"qt1-e": {
"type": "ms",
"defaultAttribute": "eid"
},
"qt2-s": {
"type": "ms",
"defaultAttribute": "who"
},
"qt2-e": {
"type": "ms",
"defaultAttribute": "eid"
}
}
}Following is a partial snippet from the schema that is an example of a marker that has the same type and name with no style attribute and with a default attribute:
<define name="Reference">
<element>
<usfm:tag match="'ref'" dump="true"/>
<name ns="">ref</name>
<optional>
<text>
<usfm:text match="TEXTNOTATTRIB" after="ATTRIBTEXTEND"/>
</text>
</optional>
<optional>
<attribute>
<usfm:match match="PIPE" matchout="'|'" dump="true"/>
<usfm:match match="TEXTNOTATTRIB"/>
<name ns="">loc</name>
<data type="string">
<usfm:pattern name="VERSE"/>
<param name="pattern">[A-Z1-4]{3}(-[A-Z1-4]{3})? ?[a-z0-9\-:]*</param>
</data>
</attribute>
</optional>
<optional>
<attribute>
<usfm:match match="TEXTNOTATTRIB" noout="true"/>
<name ns="">gen</name>
<choice>
<value>true</value>
<value>false</value>
</choice>
</attribute>
</optional>
</element>
<usfm:endtag match="'ref'" matchref=""/>
</define>Generating the marker map from only this snippet would result in the following:
{
"markers": {
"ref": {
"type": "ref",
"defaultAttribute": "loc"
}
}
}Following is a snippet from the schema that is an example of a markersRegExp entry in which the marker name is matched with RegExp:
<define name="TableContent">
<element>
<name ns="">cell</name>
<attribute>
<usfm:ptag/>
<name ns="">style</name>
<data type="string">
<param name="pattern">t[hc][rc]?\d+(-\d+)?</param>
</data>
</attribute>
<attribute>
<name ns="">align</name>
<ref name="cell.align.enum"/>
</attribute>
<optional>
<attribute>
<name ns="">colspan</name>
<data type="integer"/>
</attribute>
</optional>
<zeroOrMore>
<choice>
<text>
<usfm:text/>
</text>
<ref name="CharEmbed"/>
<ref name="Figure"/>
<ref name="Milestone"/>
<ref name="Verse"/>
<ref name="Footnote"/>
<ref name="CrossReference"/>
<ref name="Break"/>
</choice>
</zeroOrMore>
</element>
</define>Generating the marker map from only this snippet would result in the following:
{
"markersRegExp": {
"t[hc][rc]?\d+(-\d+)?": {
"type": "cell"
}
}
}Here is an example of some USX data. The tag names are the marker types, and the style attributes are the marker names:
<usx version="3.0">
<book code="EXO" style="id">World English Bible (WEB)</book>
<para style="ide">UTF-8</para>
<para style="h">Exodus</para>
<para style="toc1">The Second Book of Mosis, Commonly Called Exodus</para>
<para style="toc2">Exodus</para>
<para style="toc3">Exodus</para>
<para style="mt2">The Second Book of Moses,</para>
<para style="mt3">Commonly Called</para>
<para style="mt1">Exodus</para>
<chapter number="1" style="c" sid="EXO 1" />
<para style="p">
<verse number="1" style="v" sid="EXO 1:1" />Now these are the names of the sons of Israel, who came into Egypt (every man and his household came with Jacob): <verse eid="EXO 1:1" /><verse number="2" style="v" sid="EXO 1:2" />Reuben, Simeon, Levi, and Judah, <verse eid="EXO 1:2" /><verse number="3" style="v" sid="EXO 1:3" />Issachar, Zebulun, and Benjamin, <verse eid="EXO 1:3" /><verse number="4" style="v" sid="EXO 1:4" />Dan and Naphtali, Gad and Asher. <verse eid="EXO 1:4" /><verse number="5" style="v" sid="EXO 1:5" />All the souls who came out of Jacob’s body were seventy souls, and Joseph was in Egypt already. <verse eid="EXO 1:5" /><verse number="6" style="v" sid="EXO 1:6" />Joseph died, as did all his brothers, and all that generation. <verse eid="EXO 1:6" /><verse number="7" style="v" sid="EXO 1:7" />The children of Israel were fruitful, and increased abundantly,</para>
<para style="zTJ" vid="EXO 1:7">and multiplied, and grew exceedingly mighty; and the land was filled with them.<verse eid="EXO 1:7" /></para>
<chapter eid="EXO 1" />
</usx>