From 008f66eed775bd740e31aef98ef83708784bc28b Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 7 Nov 2025 11:05:30 -0800 Subject: [PATCH 01/35] update dap4.xsd to coincide with published schema --- dap4/dap4.xsd | 733 ++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 525 insertions(+), 208 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 71e95a4..e67f58c 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -1,199 +1,149 @@ - - - - - - - Semantic restriction: xml attributes are allowed - only on the root group, where both dapVersion and base are - required and ns is optional. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + +
+

About the Dataset document

+
+

In DAP2 and DAP4, a dataset is defined as a collection of variables, each + of which is completely described by a tuple that consists of a name, + type and value(s) along with a hierarchical collection of \'attributes\' + that themselves are made up of name-type-value(s) tuples. The variables + are organized in a hierarchy.

+

The Dataset document is a text/xml representatin of those variables and + their organiation within the dataset.

+

A note about XML element names:

+

Element names that start with capital letters correspond to parts of the + DAP4 data model while those that start with lowercase letters are used + for document structure and syntax.

+

About changes in going from DAP2 to DAP4

+

DAP 4.0 introduces SharedDimension, Group, Opaque, 64-bit integers and + UnsignedByte. In addition: The syntax for Array has been changed so that + it\'s easier for processing software to figure out the type of an array; + Grids have been generalized so that there can be any number of \'Array\' + parts (and the Maps may be multi-dimensional); and the Attribute type + OtherXML has been made its own element (it\'s no longer a type of + \'Attribute\').

+
+
+
+
+
+ + + + These elements are used in several places to hold the \'semantic\' + and/or \'use\' metadata for the dataset, its groups and variables. ( OtherXML*, + Attribute* ) + + + + + + + + + + +
+

Dataset

+

This is the XML representation of a data source in DAP 4.

+

Note that the \'blob\' element is only present when this is used as the prefix + for the Data response.

+

Grammar: ( OtherXML*, Attribute*, Group+, blob? )

+

Element attributes:

+
+
name
+
The name of the data source; often a string used to uniquely reference + the data source wrt a particular server
+
dapVersion
+
The protocol version that corresponds to this document.
+
xml:base
+
The URL the references the DAP4 service endoint used to access this + dataset.
+
+
+
+
+ + + + + + + + + + + +
+ + + + DAP uses attributes as a way to encode information that data + providers have bundled with data sources. This element is recursive. Each Attribute + element defines a lexical scope. If there are no nodes present the type + must be "Container". name: The name of the attribute; must be unique within the + scope. type: The type of the attribute. Attributes are limited to simple types, + vectors of simple types and \'Containers\' which are essentially structure types. + namespace: Use this to indicate that the given attribute means the same thing as the + matching item in the given namespace. This optional attribute is here to help + preserve information that a data server might know to be true and that a client + application could not assume with certainty. ( value* | ( OtherXML*, Attribute* ) ) + + + + + + + + + + -
- - - - - + + + + +
+ + + + + + Changes from DAP2 to DAP4: The Byte type is now signed and unsigned + bytes are now represented by the \'UByte\' data type. The types Int64 and UInt64 + represent signed and unsigned 64-bit integers. String uses UTF-8 in DAP4. + Enumerations are now in the mix. + + - - + @@ -203,19 +153,386 @@ - - - - - - - -
- - - - - - - + + + + + + + + Use this to embed arbitrary XML in a DDX. This functions like an + Attribute and appears in the same places as an Attribute, but its contents are + ignored by DAP software. Other software might find the information useful. The XML + elements must satisfy the requirements for \'lax\' processing under schema 1.0. In + practice, that means just about anything. Using ##other versus ##any means that the + enclosed XML MUST declare its namespace(s) and each element must be in a ns other + than dap4\'s. Of course, the namespace(s) used by the XML might be declared elsewhere + in the doc. name: A name to associate with this chunk of XML *: This element can + contain any other attributes that conform to the schema 1.0 definition of \'lax\' + processing ( xs:any+ ) + + + + + + + + + + + A Group is a lexical scoping tool used to replicate HDF5 and netCDF4 + Groups. Each Group defines a lexical scope. Each dataset has at least one Group; if + only one is present, it may be anonymous. In this case, by convention, it\'s name + attribute should be \'anonymous\', the default value. name: The name of the Group ( + OtherXML*, Attribute*, Dimension*, ScalarType+ ) + + + + + + + + + + + + + + + This defines a dimension (a name and size) that may be shared between + Grids and/or Arrays. name: The name of the dimension size: The size of the dimension + + + + + + + + + This defines the values of an enumeration. + + name: The name of the dimension + type: The size of the dimension + + + + + + + + + + + + + + + + + + + + DAP cardinal data types + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + This provides a version of BaseType that does not allow + child elements since Grid and Sequence may not be + array types. + + + + + + + + + + + + + + This extended ScalarBaseType so that instances + may be either scalar or N-dimensional. + + + + + + + + + + + + + + + + + + + + + + + + + This extended BaseType so that instances + must be N-dimensional (N >= 1) and reference one or more + SharedDimension objects. + + + + + + + + + + + + + + + Grids cannot have scalars (they have only Arrays and Maps) and + Arrays have both dimensions and references to maps. Regular + arrays do not have references to maps. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + This attribute is used to reference a SharedDimension object. + + + + + + + + + name: The name of the dimension size: The size of the dimension ref: + A reference to a Dimension definition Note: either name and size must be present or + only ref must be present + + + + + + + + + + + + name: Name for this Map. Maps are associated with Array dimentions by + name, so this is a required attribute. type: The type of a Map is limited to the + Cardinal types. NB: Note the limitation on the type of a Map, which excludes Maps + that are Opaques, Structures, Sequences or Grids. ( OtherXML*, Attribute*, + dimensions ) + + + + + + + + + + + + + + + A Grid is a type that relates one or more Maps (aka coordinate + variables) to the dimensions of one or more Array variables. It is often the case + that Maps correspond to independent variables like Latitude or sample number and + Arrays represent dependent variables. Note that Map elements either specify a name + and size or reference a Dimension. The scope in which the Dimension can be located + is limited to an enclosing Group (but it is not limited to the immediate parent + Group). name: The name of the variable ( OtherXML*, Attribute*, Array+, Map+) + + + + + + + + + + + + + + + + + A Structure; a simple aggregation of variables. Unlike a Group, it\'s + possible to from an Array of Structures. name: The name of the variable ( OtherXML*, + Attribute*, ( Byte | ... | Grid )+ ) + + + + + + + + + + + + + A Sequence is a type that holds tabular data where each row of the + table represents a relation, as in a relational database. Sequences can nest, but + Arrays of Sequences are not supported. name: The name of the variable ( OtherXML*, + Attribute*, ( Byte | ... | Grid )+ ) + + + + + + + + + + + +
+

The \'Blob\' element is used to point to an associated data document. When DAP + is used to access metadata only for a data source, no \'blob\' element will be + present. However, when a request for data is made, the Dataset element holds + a description of the data values and the blob points to a place where those + values will be found. In DAP4 the Blob element refers to binary + (application/octet-stream) part within a multi-part MIME document or it + refers to a separate document, possibly not imediately available. The latter + case is included to support asynchronous responses (i.e., responses that + cannot be returned quickly). See http://www.w3.org/TR/xlink11/.

+

The dc:date child element

+

If present, the element indicates the time or time range when the + information will be available. This is only sensible (i.e., valid) when the + response is asynchronous (i.e., + xlink:role="http://xml.opendap.org/dap/DAP4#asynchronous"). Dublin core + defines both a date and a date range. In the case of a date, this element + indicates when the information will likely be available; when its value is a + range, it denotes the when it will likely be present and when it wil go + away.

+

Examples of the element:

    +
  • <dc:date>1994-11-05T13:15:30Z</dc:date>
  • +
  • <dc:date>1994-11-05T13:15:30Z/1994-11-06T00:00:00Z</dc:date>
  • +
+

+

Element attributes:

+
+
xlink:href:
+
Refers to the multi-part MIME document part that holds the data values, + encoded using XDR or to a separate document. In the first case, the IRI + must begin with \'cid:\' (see the owsManifest schema for an example of + this use; http://schemas.opengis.net/ows/2.0/owsManifest.xsd). In the + latter case, the IRI must refer to a remote resourse, and will likely + start \'http:\'.
+
xlink:type:
+
Always \'simple\'.
+
xlink:role:
+
If present the only values DAP4 supports are \'asynchronousResponse\' and + \'synchronousResponse\'.
+ +
+
+
+
+ + + + + + + +
+
+ + + From 6027096b9824bfc9ab8253df7be9f7e6ff064164 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 10 Nov 2025 09:10:06 -0800 Subject: [PATCH 02/35] remove BlobType complexType --- dap4/dap4.xsd | 56 --------------------------------------------------- 1 file changed, 56 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index e67f58c..9c2d258 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -95,7 +95,6 @@ - @@ -477,61 +476,6 @@ - - - -
-

The \'Blob\' element is used to point to an associated data document. When DAP - is used to access metadata only for a data source, no \'blob\' element will be - present. However, when a request for data is made, the Dataset element holds - a description of the data values and the blob points to a place where those - values will be found. In DAP4 the Blob element refers to binary - (application/octet-stream) part within a multi-part MIME document or it - refers to a separate document, possibly not imediately available. The latter - case is included to support asynchronous responses (i.e., responses that - cannot be returned quickly). See http://www.w3.org/TR/xlink11/.

-

The dc:date child element

-

If present, the element indicates the time or time range when the - information will be available. This is only sensible (i.e., valid) when the - response is asynchronous (i.e., - xlink:role="http://xml.opendap.org/dap/DAP4#asynchronous"). Dublin core - defines both a date and a date range. In the case of a date, this element - indicates when the information will likely be available; when its value is a - range, it denotes the when it will likely be present and when it wil go - away.

-

Examples of the element:

    -
  • <dc:date>1994-11-05T13:15:30Z</dc:date>
  • -
  • <dc:date>1994-11-05T13:15:30Z/1994-11-06T00:00:00Z</dc:date>
  • -
-

-

Element attributes:

-
-
xlink:href:
-
Refers to the multi-part MIME document part that holds the data values, - encoded using XDR or to a separate document. In the first case, the IRI - must begin with \'cid:\' (see the owsManifest schema for an example of - this use; http://schemas.opengis.net/ows/2.0/owsManifest.xsd). In the - latter case, the IRI must refer to a remote resourse, and will likely - start \'http:\'.
-
xlink:type:
-
Always \'simple\'.
-
xlink:role:
-
If present the only values DAP4 supports are \'asynchronousResponse\' and - \'synchronousResponse\'.
- -
-
-
-
- - - - - - - -
- From fe5d3ebe25a94019618645b57a9a4fe583f42d98 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 10 Nov 2025 09:11:09 -0800 Subject: [PATCH 03/35] remove purl`s dc schema only used by BlobType --- dap4/dap4.xsd | 3 --- 1 file changed, 3 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 9c2d258..272cee2 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -4,7 +4,6 @@ xmlns:dap="http://xml.opendap.org/ns/DAP/4.0#" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:xlink="http://www.w3.org/1999/xlink" - xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema http://www.w3.org/2001/XMLSchema.xsd" @@ -21,8 +20,6 @@ - - From 31538d2225e2eaf708eb5993d1c23602ebd98066 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 19 Nov 2025 11:11:27 -0800 Subject: [PATCH 04/35] update dap4 schema using that from 3.3 published source as starting point --- dap4/dap4.xsd | 607 +++++++++++++++++++------------------------------- 1 file changed, 223 insertions(+), 384 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 272cee2..3fce1d5 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -1,145 +1,168 @@ - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + - - - - + + - + + + + + + - -
-

About the Dataset document

-
-

In DAP2 and DAP4, a dataset is defined as a collection of variables, each - of which is completely described by a tuple that consists of a name, - type and value(s) along with a hierarchical collection of \'attributes\' - that themselves are made up of name-type-value(s) tuples. The variables - are organized in a hierarchy.

-

The Dataset document is a text/xml representatin of those variables and - their organiation within the dataset.

-

A note about XML element names:

-

Element names that start with capital letters correspond to parts of the - DAP4 data model while those that start with lowercase letters are used - for document structure and syntax.

-

About changes in going from DAP2 to DAP4

-

DAP 4.0 introduces SharedDimension, Group, Opaque, 64-bit integers and - UnsignedByte. In addition: The syntax for Array has been changed so that - it\'s easier for processing software to figure out the type of an array; - Grids have been generalized so that there can be any number of \'Array\' - parts (and the Maps may be multi-dimensional); and the Attribute type - OtherXML has been made its own element (it\'s no longer a type of - \'Attribute\').

-
-
-
+ DAP Variable Types
-
+ + + + + + + + + + + + + + + + + + +
- + + - These elements are used in several places to hold the \'semantic\' - and/or \'use\' metadata for the dataset, its groups and variables. ( OtherXML*, - Attribute* ) + This is the XML representation of a DAP DDS object. - - + + + + - + + + + - + - -
-

Dataset

-

This is the XML representation of a data source in DAP 4.

-

Note that the \'blob\' element is only present when this is used as the prefix - for the Data response.

-

Grammar: ( OtherXML*, Attribute*, Group+, blob? )

-

Element attributes:

-
-
name
-
The name of the data source; often a string used to uniquely reference - the data source wrt a particular server
-
dapVersion
-
The protocol version that corresponds to this document.
-
xml:base
-
The URL the references the DAP4 service endoint used to access this - dataset.
-
-
-
+ A Group is a lexical scoping tool used to replicate HDF5 and netCDF4 + Groups. Each Group defines a lexical scope. Each dataset has at least one Group; if + only one is present, it may be anonymous. In this case, by convention, it\'s name + attribute should be \'anonymous\'.
- - - - + + + + + + +
+ + + This holds a dimension, a name and size, that may be shared between + Grids and/or Arrays. SharedDimensions are lexically scoped. + - - + - + + - DAP uses attributes as a way to encode information that data - providers have bundled with data sources. This element is recursive. Each Attribute - element defines a lexical scope. If there are no nodes present the type - must be "Container". name: The name of the attribute; must be unique within the - scope. type: The type of the attribute. Attributes are limited to simple types, - vectors of simple types and \'Containers\' which are essentially structure types. - namespace: Use this to indicate that the given attribute means the same thing as the - matching item in the given namespace. This optional attribute is here to help - preserve information that a data server might know to be true and that a client - application could not assume with certainty. ( value* | ( OtherXML*, Attribute* ) ) - + DAP Attribute Type - - + - + + - - + - - - - Changes from DAP2 to DAP4: The Byte type is now signed and unsigned - bytes are now represented by the \'UByte\' data type. The types Int64 and UInt64 - represent signed and unsigned 64-bit integers. String uses UTF-8 in DAP4. - Enumerations are now in the mix. - + + + + + + + - + @@ -149,331 +172,147 @@ - - + + + - - - Use this to embed arbitrary XML in a DDX. This functions like an - Attribute and appears in the same places as an Attribute, but its contents are - ignored by DAP software. Other software might find the information useful. The XML - elements must satisfy the requirements for \'lax\' processing under schema 1.0. In - practice, that means just about anything. Using ##other versus ##any means that the - enclosed XML MUST declare its namespace(s) and each element must be in a ns other - than dap4\'s. Of course, the namespace(s) used by the XML might be declared elsewhere - in the doc. name: A name to associate with this chunk of XML *: This element can - contain any other attributes that conform to the schema 1.0 definition of \'lax\' - processing ( xs:any+ ) - - - - - - - - + - A Group is a lexical scoping tool used to replicate HDF5 and netCDF4 - Groups. Each Group defines a lexical scope. Each dataset has at least one Group; if - only one is present, it may be anonymous. In this case, by convention, it\'s name - attribute should be \'anonymous\', the default value. name: The name of the Group ( - OtherXML*, Attribute*, Dimension*, ScalarType+ ) + When we want to embed arbitrary XML in a DDX use this node. This + functions like an attribute and appear in the same general place as an attribute, + but its contents are ignored by DAP software. Other software might find the + information useful. - - - - + - - - - - - - - This defines a dimension (a name and size) that may be shared between - Grids and/or Arrays. name: The name of the dimension size: The size of the dimension - - - - + + - - - This defines the values of an enumeration. - - name: The name of the dimension - type: The size of the dimension - - - - - - - - - - - - - - - - - - - DAP cardinal data types - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + - - This provides a version of BaseType that does not allow - child elements since Grid and Sequence may not be - array types. - + DAP Base Type - - + + + - - - + - - - This extended ScalarBaseType so that instances - may be either scalar or N-dimensional. - - - - - - - - - - - - - - + - - - - - - - - This extended BaseType so that instances - must be N-dimensional (N >= 1) and reference one or more - SharedDimension objects. - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - Grids cannot have scalars (they have only Arrays and Maps) and - Arrays have both dimensions and references to maps. Regular - arrays do not have references to maps. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This attribute is used to reference a SharedDimension object. - - - - - - - - - name: The name of the dimension size: The size of the dimension ref: - A reference to a Dimension definition Note: either name and size must be present or - only ref must be present - - - - - - - - - - - name: Name for this Map. Maps are associated with Array dimentions by - name, so this is a required attribute. type: The type of a Map is limited to the - Cardinal types. NB: Note the limitation on the type of a Map, which excludes Maps - that are Opaques, Structures, Sequences or Grids. ( OtherXML*, Attribute*, - dimensions ) - - - - - - - - - + + + + - - - - - A Grid is a type that relates one or more Maps (aka coordinate - variables) to the dimensions of one or more Array variables. It is often the case - that Maps correspond to independent variables like Latitude or sample number and - Arrays represent dependent variables. Note that Map elements either specify a name - and size or reference a Dimension. The scope in which the Dimension can be located - is limited to an enclosing Group (but it is not limited to the immediate parent - Group). name: The name of the variable ( OtherXML*, Attribute*, Array+, Map+) - - - - - - - - - - - + + + - + + + + + + + + + + + + + + + + + + + + - A Structure; a simple aggregation of variables. Unlike a Group, it\'s - possible to from an Array of Structures. name: The name of the variable ( OtherXML*, - Attribute*, ( Byte | ... | Grid )+ ) + What this does not capture is that a Map appears both at the start of + a DAP4 Grid and must bind a name for the Map to be used within the Grid to either a + SharedDimension or a size. However, the Map element also appears within the Array + and there only with the name attribute. + + + + + + + + + + + + + - + - - - A Sequence is a type that holds tabular data where each row of the - table represents a relation, as in a relational database. Sequences can nest, but - Arrays of Sequences are not supported. name: The name of the variable ( OtherXML*, - Attribute*, ( Byte | ... | Grid )+ ) - + - + - - - From c87802949b7903fc8fce1aaae3ebd5b2071e4e26 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Thu, 20 Nov 2025 10:16:23 -0800 Subject: [PATCH 05/35] change to DatasetType --- dap4/dap4.xsd | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 3fce1d5..222c4a2 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -16,7 +16,7 @@ xmlns="http://xml.opendap.org/ns/DAP/4.0#" xmlns:dap="http://xml.opendap.org/ns/DAP/4.0#" xmlns:xml="http://www.w3.org/XML/1998/namespace" - xmlns:xlink="http://www.w3.org/1999/xlink" + xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema http://www.w3.org/2001/XMLSchema.xsd" @@ -28,7 +28,7 @@ xlink a 'non-normative' one is provided in the appendix of http://www.w3.org/TR/xlink11/. jhrg 2/7/12 --> - + @@ -92,9 +92,9 @@ - + - This is the XML representation of a DAP DDS object. + This is the XML representation of a DAP DMR object. From 4010fe74eaf90428e77fa1179b943ac5b61f1932 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Thu, 20 Nov 2025 17:37:05 -0800 Subject: [PATCH 06/35] update schema that validates --- dap4/dap4.xsd | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 222c4a2..7ebd815 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -22,7 +22,12 @@ xsi:schemaLocation="http://www.w3.org/2001/XMLSchema http://www.w3.org/2001/XMLSchema.xsd" elementFormDefault="qualified" attributeFormDefault="unqualified" + targetNamespace="http://xml.opendap.org/ns/DAP/4.0#" xml:lang="en"> + + + + + @@ -115,6 +129,7 @@ attribute should be \'anonymous\'. + @@ -133,6 +148,19 @@ + + + This holds a dimension, a name and size, that may be shared between + and Array. SharedDimensions are lexically scoped. + + + + + + + + + @@ -140,7 +168,7 @@ - + @@ -153,6 +181,12 @@ + + + + + + @@ -193,12 +227,12 @@ - DAP Base Type + From e919262ba945b945a8bfd9c6ec187a2109d88cff Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Thu, 20 Nov 2025 17:46:25 -0800 Subject: [PATCH 07/35] add simple test dmr file to setup initial validation test --- tests/data/SimpleGroup.dmr | 50 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 tests/data/SimpleGroup.dmr diff --git a/tests/data/SimpleGroup.dmr b/tests/data/SimpleGroup.dmr new file mode 100644 index 0000000..8333a39 --- /dev/null +++ b/tests/data/SimpleGroup.dmr @@ -0,0 +1,50 @@ + + + + + + + + + + + + + + + + + + + + + + A simple group for testing. + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file From 52354b71819a98498d2a55cc89aa44c070a0c627 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Thu, 20 Nov 2025 18:48:22 -0800 Subject: [PATCH 08/35] set up test workflows --- .github/workflows/validate.yml | 29 +++++++++++++++++++ .gitignore | 51 ++++++++++++++++++++++++++++++++++ dap4/dap4.xsd | 1 + pyproject.toml | 33 ++++++++++++++++++++++ tests/test_validate_dmrs.py | 34 +++++++++++++++++++++++ 5 files changed, 148 insertions(+) create mode 100644 .github/workflows/validate.yml create mode 100644 .gitignore create mode 100644 pyproject.toml create mode 100644 tests/test_validate_dmrs.py diff --git a/.github/workflows/validate.yml b/.github/workflows/validate.yml new file mode 100644 index 0000000..8ba1e58 --- /dev/null +++ b/.github/workflows/validate.yml @@ -0,0 +1,29 @@ +name: Validate DAP4 XML + +on: + push: + branches: [ main ] + pull_request: + branches: [ main ] + +jobs: + validate-dmr: + runs-on: ubuntu-latest + + steps: + - name: Checkout repository + uses: actions/checkout@v3 + + - name: Set up Python + uses: actions/setup-python@v4 + with: + python-version: '3.12' + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install lxml pytest + + - name: Run DMR XML validation + run: | + pytest -v diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..476442c --- /dev/null +++ b/.gitignore @@ -0,0 +1,51 @@ +*.py[cod] + +# C extensions +*.so + +# Packages +*.egg +*.egg-info +dist +build +eggs +parts +bin +var +sdist +develop-eggs +.installed.cfg +lib +lib64 + +# Installer logs +pip-log.txt + +# ignore shell script at base level +*.sh + +# Unit test / coverage reports +.coverage +.tox +nosetests.xml + +# Translations +*.mo + +# Mr Developer +.mr.developer.cfg +.project +.pydevproject + +# Vim +*.swp + +.cache +__pycache__ +tests/__pycache__ + +# OS-X Finder +*.DS_Store + +# IDEA Projects +.idea diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 7ebd815..692fbe7 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -236,6 +236,7 @@ + diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..897aa94 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,33 @@ +[tool.setuptools] +packages = [] + +[project] +name = "dap4-validator" +version = "0.1.0" +description = "Test suite for validating DAP4 XML (DMR) documents against a schema." +authors = [ + { name = "Miguel Angel Jimenez-Urias", email = "mjimenez@opendap.org" } +] +license = { text = "MIT" } +requires-python = ">=3.11" + +# Core dependencies only +dependencies = [ + "lxml>=4.9.3", + "pytest>=7.0", +] + +[project.optional-dependencies] +dev = [ + "ruff>=0.1.0", + "black>=23.0", +] + +[tool.pytest.ini_options] +testpaths = ["tests"] +addopts = "-v" + +[build-system] +requires = ["setuptools>=61"] +build-backend = "setuptools.build_meta" + diff --git a/tests/test_validate_dmrs.py b/tests/test_validate_dmrs.py new file mode 100644 index 0000000..fcb436a --- /dev/null +++ b/tests/test_validate_dmrs.py @@ -0,0 +1,34 @@ +from lxml import etree +import pytest +from pathlib import Path +import glob + +# Path to this test file +TEST_DIR = Path(__file__).resolve().parent + +# Path to project root +ROOT_DIR = TEST_DIR.parent +# Path to the schema +SCHEMA_PATH = ROOT_DIR / "dap4" / "dap4.xsd" +# Path to test dmrs +DATA_DIR = TEST_DIR / "data" +DMR_PATHS = list(DATA_DIR.glob("*.dmr")) + + +@pytest.fixture(scope="session") +def dap4_schema(): + """Load and compile the DAP4 XML schema.""" + with open(SCHEMA_PATH, "rb") as f: + schema_doc = etree.parse(f) + return etree.XMLSchema(schema_doc) + +@pytest.mark.parametrize("dmr_file", DMR_PATHS) +def test_validate_dmr_files(dap4_schema, dmr_file): + """Validate all DMR/XML files in the dmr/ directory.""" + with open(dmr_file, "rb") as f: + doc = etree.parse(f) + + try: + dap4_schema.assertValid(doc) + except etree.DocumentInvalid as e: + pytest.fail(f"DMR validation failed for {dmr_file}:\n{str(e)}") From de6bf9afb3b291d1d2d8298c054a16ba01da62c9 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Thu, 20 Nov 2025 18:58:27 -0800 Subject: [PATCH 09/35] add pre-commit for syntax on dap4 schema, and tests files --- .pre-commit-config.yaml | 29 +++++++++++++++++++++++++++++ tests/test_validate_dmrs.py | 2 +- 2 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 .pre-commit-config.yaml diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml new file mode 100644 index 0000000..55aa715 --- /dev/null +++ b/.pre-commit-config.yaml @@ -0,0 +1,29 @@ +repos: + - repo: https://github.com/psf/black + rev: 23.12.1 + hooks: + - id: black + files: ^tests/.*\.py$ + + - repo: https://github.com/astral-sh/ruff-pre-commit + rev: v0.3.2 + hooks: + - id: ruff + args: ["--fix"] + files: ^tests/.*\.py$ + + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v4.5.0 + hooks: + - id: check-xml + files: | + ^dap4/dap4\.xsd$ + ^tests/data/.*\.dmr$ + - id: trailing-whitespace + files: | + ^dap4/.*$ + ^tests/.*$ + - id: end-of-file-fixer + files: | + ^dap4/.*$ + ^tests/.*$ diff --git a/tests/test_validate_dmrs.py b/tests/test_validate_dmrs.py index fcb436a..ad8cdb2 100644 --- a/tests/test_validate_dmrs.py +++ b/tests/test_validate_dmrs.py @@ -1,7 +1,6 @@ from lxml import etree import pytest from pathlib import Path -import glob # Path to this test file TEST_DIR = Path(__file__).resolve().parent @@ -22,6 +21,7 @@ def dap4_schema(): schema_doc = etree.parse(f) return etree.XMLSchema(schema_doc) + @pytest.mark.parametrize("dmr_file", DMR_PATHS) def test_validate_dmr_files(dap4_schema, dmr_file): """Validate all DMR/XML files in the dmr/ directory.""" From 136d827a2fbce92e56f272dcdff3d711bdf3bc55 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Thu, 20 Nov 2025 19:00:13 -0800 Subject: [PATCH 10/35] run on master branch --- .github/workflows/validate.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/validate.yml b/.github/workflows/validate.yml index 8ba1e58..f701e41 100644 --- a/.github/workflows/validate.yml +++ b/.github/workflows/validate.yml @@ -2,9 +2,9 @@ name: Validate DAP4 XML on: push: - branches: [ main ] + branches: [ master ] pull_request: - branches: [ main ] + branches: [ master ] jobs: validate-dmr: From 6052e6db7dd5104ef3bfc7abcf36d09868885b4b Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 21 Nov 2025 08:33:25 -0800 Subject: [PATCH 11/35] enable min occurange=0 for Groups --- dap4/dap4.xsd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 692fbe7..bd324f4 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -110,7 +110,7 @@ - + From 7a175d1f13139d6ad42904b48c206a981878fa25 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 21 Nov 2025 08:36:03 -0800 Subject: [PATCH 12/35] rename test dmr, describe test in name --- tests/data/MapsArraysOnly.dmr | 28 ++++++++++++++++++++ tests/data/SimpleGroup.dmr | 50 ----------------------------------- 2 files changed, 28 insertions(+), 50 deletions(-) create mode 100644 tests/data/MapsArraysOnly.dmr delete mode 100644 tests/data/SimpleGroup.dmr diff --git a/tests/data/MapsArraysOnly.dmr b/tests/data/MapsArraysOnly.dmr new file mode 100644 index 0000000..da0c162 --- /dev/null +++ b/tests/data/MapsArraysOnly.dmr @@ -0,0 +1,28 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + DMR for testing Maps, Dims at root level (no Groups, Sequences or Structures). + + \ No newline at end of file diff --git a/tests/data/SimpleGroup.dmr b/tests/data/SimpleGroup.dmr deleted file mode 100644 index 8333a39..0000000 --- a/tests/data/SimpleGroup.dmr +++ /dev/null @@ -1,50 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - A simple group for testing. - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file From 57b85725a8cc465c164d7823ef41d21ad6da6c1d Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 21 Nov 2025 16:14:02 -0800 Subject: [PATCH 13/35] attribute test dmr --- tests/data/Attributes_test1.dmr | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 tests/data/Attributes_test1.dmr diff --git a/tests/data/Attributes_test1.dmr b/tests/data/Attributes_test1.dmr new file mode 100644 index 0000000..6f1d568 --- /dev/null +++ b/tests/data/Attributes_test1.dmr @@ -0,0 +1,18 @@ + + + + DMR for testing Maps, Dims at root level (no Groups, Sequences or Structures). + + + 1 + + + 1 + 2 + 3 + + + + + + \ No newline at end of file From 9b6e41a2743e675b2cc390d0b8a31d66a5d81b4d Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 21 Nov 2025 16:26:36 -0800 Subject: [PATCH 14/35] add more tests - fix schema --- dap4/dap4.xsd | 6 ++---- tests/data/GroupTest1.dmr | 33 +++++++++++++++++++++++++++++++++ tests/data/MapsArraysOnly.dmr | 2 ++ tests/data/NestedGroup.dmr | 7 +++++++ 4 files changed, 44 insertions(+), 4 deletions(-) create mode 100644 tests/data/GroupTest1.dmr create mode 100644 tests/data/NestedGroup.dmr diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index bd324f4..846ed35 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -108,7 +108,6 @@ - @@ -130,11 +129,10 @@ + - - - + diff --git a/tests/data/GroupTest1.dmr b/tests/data/GroupTest1.dmr new file mode 100644 index 0000000..ea7b43a --- /dev/null +++ b/tests/data/GroupTest1.dmr @@ -0,0 +1,33 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DMR for testing Maps, Dims at root level (no Groups, Sequences or Structures). + + + \ No newline at end of file diff --git a/tests/data/MapsArraysOnly.dmr b/tests/data/MapsArraysOnly.dmr index da0c162..b36a9aa 100644 --- a/tests/data/MapsArraysOnly.dmr +++ b/tests/data/MapsArraysOnly.dmr @@ -18,6 +18,8 @@ + + diff --git a/tests/data/NestedGroup.dmr b/tests/data/NestedGroup.dmr new file mode 100644 index 0000000..94f296e --- /dev/null +++ b/tests/data/NestedGroup.dmr @@ -0,0 +1,7 @@ + + + + + + + \ No newline at end of file From 684d4d64eb9787107a5e14df0f38cb396c3f8fa4 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 21 Nov 2025 16:41:23 -0800 Subject: [PATCH 15/35] more tests --- dap4/dap4.xsd | 5 +++++ tests/data/Attributes_test2.dmr | 8 ++++++++ tests/data/testSequence1.dmr | 12 ++++++++++++ 3 files changed, 25 insertions(+) create mode 100644 tests/data/Attributes_test2.dmr create mode 100644 tests/data/testSequence1.dmr diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 846ed35..af11fd8 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -96,6 +96,11 @@ + diff --git a/tests/data/Attributes_test2.dmr b/tests/data/Attributes_test2.dmr new file mode 100644 index 0000000..59a583d --- /dev/null +++ b/tests/data/Attributes_test2.dmr @@ -0,0 +1,8 @@ + + + + + DODS FreeFrom based on FFND release 4.2.3 + + + diff --git a/tests/data/testSequence1.dmr b/tests/data/testSequence1.dmr new file mode 100644 index 0000000..1353067 --- /dev/null +++ b/tests/data/testSequence1.dmr @@ -0,0 +1,12 @@ + + + + + + + + + + Test sequence. + + \ No newline at end of file From 148312e93c30be55be8072172b807dd545207b07 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 21 Nov 2025 16:49:57 -0800 Subject: [PATCH 16/35] more tests --- tests/data/GroupTest1.dmr | 3 +++ tests/data/testSequence2.dmr | 16 ++++++++++++++++ 2 files changed, 19 insertions(+) create mode 100644 tests/data/testSequence2.dmr diff --git a/tests/data/GroupTest1.dmr b/tests/data/GroupTest1.dmr index ea7b43a..67c2223 100644 --- a/tests/data/GroupTest1.dmr +++ b/tests/data/GroupTest1.dmr @@ -8,6 +8,9 @@ + + DMR for testing Maps, Dims at root level (no Groups, Sequences or Structures). + diff --git a/tests/data/testSequence2.dmr b/tests/data/testSequence2.dmr new file mode 100644 index 0000000..4a0d2df --- /dev/null +++ b/tests/data/testSequence2.dmr @@ -0,0 +1,16 @@ + + + + Test sequence. + + + + + + + + + Test sequence. + + + \ No newline at end of file From 3ce4bd90b941a98aabc56b0283f1b3ca9c02902b Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 21 Nov 2025 17:19:46 -0800 Subject: [PATCH 17/35] add test dmr with an xml element as a value to an attribute --- tests/data/Attributes_test3.dmr | 6 ++++++ tests/data/Attributes_test4.dmr | 28 ++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+) create mode 100644 tests/data/Attributes_test3.dmr create mode 100644 tests/data/Attributes_test4.dmr diff --git a/tests/data/Attributes_test3.dmr b/tests/data/Attributes_test3.dmr new file mode 100644 index 0000000..ee90bdd --- /dev/null +++ b/tests/data/Attributes_test3.dmr @@ -0,0 +1,6 @@ + + + + + + \ No newline at end of file diff --git a/tests/data/Attributes_test4.dmr b/tests/data/Attributes_test4.dmr new file mode 100644 index 0000000..971834e --- /dev/null +++ b/tests/data/Attributes_test4.dmr @@ -0,0 +1,28 @@ + + + + + Passive soil moisture estimates onto a 36-km global Earth-fixed grid, based on radiometer measurements acquired when the SMAP spacecraft is travelling from North to South at approximately 6:00 AM local time. + + + File_001.h5 + File_002.h5 + File_003.h5 + File_004.h5 + File_005.h5 + + + L2Data + + + 2017-01-04 + 2017-01-04 + 2017-01-04 + 2017-01-04 + 2017-01-05 + + + 36. + + + \ No newline at end of file From dfd7f332f639fa039de3c8c3f55a92ee0875d152 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Fri, 21 Nov 2025 17:40:53 -0800 Subject: [PATCH 18/35] set up more tests, including failing ones with Structures --- tests/data/Structure_test.dmr | 15 +++++++++++++++ tests/data/Structure_test2.dmr | 11 +++++++++++ tests/test_validate_dmrs.py | 22 ++++++++++++++++------ 3 files changed, 42 insertions(+), 6 deletions(-) create mode 100644 tests/data/Structure_test.dmr create mode 100644 tests/data/Structure_test2.dmr diff --git a/tests/data/Structure_test.dmr b/tests/data/Structure_test.dmr new file mode 100644 index 0000000..9614ef5 --- /dev/null +++ b/tests/data/Structure_test.dmr @@ -0,0 +1,15 @@ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tests/data/Structure_test2.dmr b/tests/data/Structure_test2.dmr new file mode 100644 index 0000000..d452bad --- /dev/null +++ b/tests/data/Structure_test2.dmr @@ -0,0 +1,11 @@ + + + + + + + + + + + \ No newline at end of file diff --git a/tests/test_validate_dmrs.py b/tests/test_validate_dmrs.py index ad8cdb2..2cd51a6 100644 --- a/tests/test_validate_dmrs.py +++ b/tests/test_validate_dmrs.py @@ -25,10 +25,20 @@ def dap4_schema(): @pytest.mark.parametrize("dmr_file", DMR_PATHS) def test_validate_dmr_files(dap4_schema, dmr_file): """Validate all DMR/XML files in the dmr/ directory.""" - with open(dmr_file, "rb") as f: - doc = etree.parse(f) + if not dmr_file.name.startswith("Structure"): + with open(dmr_file, "rb") as f: + doc = etree.parse(f) + try: + dap4_schema.assertValid(doc) + except etree.DocumentInvalid as e: + pytest.fail(f"DMR validation failed for {dmr_file}:\n{str(e)}") - try: - dap4_schema.assertValid(doc) - except etree.DocumentInvalid as e: - pytest.fail(f"DMR validation failed for {dmr_file}:\n{str(e)}") + +@pytest.mark.parametrize("dmr_file", DMR_PATHS) +def test_Structure_fails_validate_dmr_files(dap4_schema, dmr_file): + """Validate all DMR/XML files in the dmr/ directory.""" + if dmr_file.name.startswith("Structure"): + with open(dmr_file, "rb") as f: + doc = etree.parse(f) + with pytest.raises(etree.DocumentInvalid): + dap4_schema.assertValid(doc) From bd8a0e9a258ad2e374d3869143c3a269ff952ad7 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 24 Nov 2025 08:45:41 -0800 Subject: [PATCH 19/35] enable Dim elements wth optional name, size attributes --- dap4/dap4.xsd | 4 +++- tests/data/GroupStructureSequence.dmr | 22 ++++++++++++++++++++++ tests/data/Structure_test2.dmr | 3 ++- tests/test_validate_dmrs.py | 23 ++++++----------------- 4 files changed, 33 insertions(+), 19 deletions(-) create mode 100644 tests/data/GroupStructureSequence.dmr diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index af11fd8..a72606e 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -162,7 +162,9 @@ - + + + diff --git a/tests/data/GroupStructureSequence.dmr b/tests/data/GroupStructureSequence.dmr new file mode 100644 index 0000000..b557988 --- /dev/null +++ b/tests/data/GroupStructureSequence.dmr @@ -0,0 +1,22 @@ + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tests/data/Structure_test2.dmr b/tests/data/Structure_test2.dmr index d452bad..cdbde25 100644 --- a/tests/data/Structure_test2.dmr +++ b/tests/data/Structure_test2.dmr @@ -5,7 +5,8 @@ - + + \ No newline at end of file diff --git a/tests/test_validate_dmrs.py b/tests/test_validate_dmrs.py index 2cd51a6..230d368 100644 --- a/tests/test_validate_dmrs.py +++ b/tests/test_validate_dmrs.py @@ -25,20 +25,9 @@ def dap4_schema(): @pytest.mark.parametrize("dmr_file", DMR_PATHS) def test_validate_dmr_files(dap4_schema, dmr_file): """Validate all DMR/XML files in the dmr/ directory.""" - if not dmr_file.name.startswith("Structure"): - with open(dmr_file, "rb") as f: - doc = etree.parse(f) - try: - dap4_schema.assertValid(doc) - except etree.DocumentInvalid as e: - pytest.fail(f"DMR validation failed for {dmr_file}:\n{str(e)}") - - -@pytest.mark.parametrize("dmr_file", DMR_PATHS) -def test_Structure_fails_validate_dmr_files(dap4_schema, dmr_file): - """Validate all DMR/XML files in the dmr/ directory.""" - if dmr_file.name.startswith("Structure"): - with open(dmr_file, "rb") as f: - doc = etree.parse(f) - with pytest.raises(etree.DocumentInvalid): - dap4_schema.assertValid(doc) + with open(dmr_file, "rb") as f: + doc = etree.parse(f) + try: + dap4_schema.assertValid(doc) + except etree.DocumentInvalid as e: + pytest.fail(f"DMR validation failed for {dmr_file}:\n{str(e)}") From 69c9af51027cd4ea194684e26dc94d048641f621 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 24 Nov 2025 09:12:47 -0800 Subject: [PATCH 20/35] nested struct no dims --- tests/data/NestedStructure.dmr | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 tests/data/NestedStructure.dmr diff --git a/tests/data/NestedStructure.dmr b/tests/data/NestedStructure.dmr new file mode 100644 index 0000000..f11f0a6 --- /dev/null +++ b/tests/data/NestedStructure.dmr @@ -0,0 +1,11 @@ + + + + + + + + + + + From 3b9b06c0bc58736cb7f087bb196538efc695d05e Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 24 Nov 2025 14:06:13 -0800 Subject: [PATCH 21/35] enable testing specific dap4 schema declarations in python --- tests/data/Invalid_BaseType_Dim.dmr | 7 ++++ tests/test_validate_dmrs.py | 26 ++++++++---- tests/validate_dmr_semantics.py | 65 +++++++++++++++++++++++++++++ 3 files changed, 91 insertions(+), 7 deletions(-) create mode 100644 tests/data/Invalid_BaseType_Dim.dmr create mode 100644 tests/validate_dmr_semantics.py diff --git a/tests/data/Invalid_BaseType_Dim.dmr b/tests/data/Invalid_BaseType_Dim.dmr new file mode 100644 index 0000000..01acdc6 --- /dev/null +++ b/tests/data/Invalid_BaseType_Dim.dmr @@ -0,0 +1,7 @@ + + + + + + + \ No newline at end of file diff --git a/tests/test_validate_dmrs.py b/tests/test_validate_dmrs.py index 230d368..c96d9f9 100644 --- a/tests/test_validate_dmrs.py +++ b/tests/test_validate_dmrs.py @@ -1,6 +1,7 @@ from lxml import etree import pytest from pathlib import Path +from validate_dmr_semantics import validate_dim_semantics # Path to this test file TEST_DIR = Path(__file__).resolve().parent @@ -23,11 +24,22 @@ def dap4_schema(): @pytest.mark.parametrize("dmr_file", DMR_PATHS) -def test_validate_dmr_files(dap4_schema, dmr_file): - """Validate all DMR/XML files in the dmr/ directory.""" - with open(dmr_file, "rb") as f: - doc = etree.parse(f) - try: +def test_valid_dmrs(dap4_schema, dmr_file): + if not dmr_file.name.startswith("Invalid"): + doc = etree.parse(str(dmr_file)) + # XSD validation dap4_schema.assertValid(doc) - except etree.DocumentInvalid as e: - pytest.fail(f"DMR validation failed for {dmr_file}:\n{str(e)}") + # Semantic validation + validate_dim_semantics(doc) + + +@pytest.mark.parametrize("dmr_file", DMR_PATHS) +def test_fail_validate_dim_BaseType(dap4_schema, dmr_file): + dmr_file = DATA_DIR / "Invalid_BaseType_Dim.dmr" + + if dmr_file.name.startswith("Invalid"): + doc = etree.parse(str(dmr_file)) + # XSD validation + dap4_schema.assertValid(doc) + with pytest.raises(ValueError): + validate_dim_semantics(doc) diff --git a/tests/validate_dmr_semantics.py b/tests/validate_dmr_semantics.py new file mode 100644 index 0000000..343e647 --- /dev/null +++ b/tests/validate_dmr_semantics.py @@ -0,0 +1,65 @@ +from lxml import etree + +DAP4_NS = "http://xml.opendap.org/ns/DAP/4.0#" +NS = {"d": DAP4_NS} + +BASE_TYPES = { + "Byte", + "SignedByte", + "Int16", + "UInt16", + "Int32", + "UInt32", + "Int64", + "UInt64", + "Float32", + "Float64", + "String", + "Url", + "Opaque", + "Structure", +} + + +DAP4_NS = "http://xml.opendap.org/ns/DAP/4.0#" +NS = {"d": DAP4_NS} + +BASE_TYPES = { + "Byte", + "SignedByte", + "Int16", + "UInt16", + "Int32", + "UInt32", + "Int64", + "UInt64", + "Float32", + "Float64", + "String", + "Url", + "Opaque", + "Structure", +} + + +def validate_dim_semantics(doc): + """ + Enforce: Every inside BaseTypes must have (name or size), + and cannot omit both. + """ + root = doc.getroot() + + for tag in BASE_TYPES: + for base in root.xpath(f"//d:{tag}", namespaces=NS): + dims = base.xpath(".//d:Dim", namespaces=NS) + + for dim in dims: + name = dim.get("name") + size = dim.get("size") + + # must have one or the other + if name is None and size is None: + raise ValueError( + f" in base type <{tag}> must have either @name or @size: " + f"{etree.tostring(dim, encoding='UTF-8')}" + ) From fbda1a6598aec3feedc5313e63fa14d017f689e5 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 24 Nov 2025 14:09:06 -0800 Subject: [PATCH 22/35] remove fixture - set tests per invalide case --- tests/test_validate_dmrs.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/test_validate_dmrs.py b/tests/test_validate_dmrs.py index c96d9f9..e622fc7 100644 --- a/tests/test_validate_dmrs.py +++ b/tests/test_validate_dmrs.py @@ -33,8 +33,7 @@ def test_valid_dmrs(dap4_schema, dmr_file): validate_dim_semantics(doc) -@pytest.mark.parametrize("dmr_file", DMR_PATHS) -def test_fail_validate_dim_BaseType(dap4_schema, dmr_file): +def test_fail_validate_dim_BaseType(dap4_schema): dmr_file = DATA_DIR / "Invalid_BaseType_Dim.dmr" if dmr_file.name.startswith("Invalid"): From d0a150ff41b02288c08b50ae58b996db6577a3b1 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 24 Nov 2025 14:15:32 -0800 Subject: [PATCH 23/35] dataset declaration dmr --- tests/data/Dataset_Declaration.dmr | 3 +++ tests/data/Invalid_BaseType_Dim.dmr | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) create mode 100644 tests/data/Dataset_Declaration.dmr diff --git a/tests/data/Dataset_Declaration.dmr b/tests/data/Dataset_Declaration.dmr new file mode 100644 index 0000000..48be688 --- /dev/null +++ b/tests/data/Dataset_Declaration.dmr @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/tests/data/Invalid_BaseType_Dim.dmr b/tests/data/Invalid_BaseType_Dim.dmr index 01acdc6..5ad12a8 100644 --- a/tests/data/Invalid_BaseType_Dim.dmr +++ b/tests/data/Invalid_BaseType_Dim.dmr @@ -1,5 +1,5 @@ - + From c30faf20e4d88ea84848111a7c8bc9991c22d7ad Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 24 Nov 2025 16:09:43 -0800 Subject: [PATCH 24/35] enable enumeration types validation --- dap4/dap4.xsd | 54 +++++++++++++++++++++++++++++++++++++++ tests/data/Enum_test1.dmr | 18 +++++++++++++ tests/data/Enum_test2.dmr | 20 +++++++++++++++ tests/data/Enum_test3.dmr | 21 +++++++++++++++ 4 files changed, 113 insertions(+) create mode 100644 tests/data/Enum_test1.dmr create mode 100644 tests/data/Enum_test2.dmr create mode 100644 tests/data/Enum_test3.dmr diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index a72606e..9543d4e 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -67,6 +67,10 @@ String or URL. --> + + + + @@ -101,6 +105,8 @@ --> + + @@ -111,6 +117,7 @@ + @@ -134,6 +141,7 @@ + @@ -355,4 +363,50 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/tests/data/Enum_test1.dmr b/tests/data/Enum_test1.dmr new file mode 100644 index 0000000..0deef8f --- /dev/null +++ b/tests/data/Enum_test1.dmr @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tests/data/Enum_test2.dmr b/tests/data/Enum_test2.dmr new file mode 100644 index 0000000..8fbf934 --- /dev/null +++ b/tests/data/Enum_test2.dmr @@ -0,0 +1,20 @@ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/tests/data/Enum_test3.dmr b/tests/data/Enum_test3.dmr new file mode 100644 index 0000000..770c65e --- /dev/null +++ b/tests/data/Enum_test3.dmr @@ -0,0 +1,21 @@ + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file From 0ac50b74834fee6dd6a2e6f084a6fab0fd98e25f Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 24 Nov 2025 16:38:43 -0800 Subject: [PATCH 25/35] test all possible value a BaseType may take --- dap4/dap4.xsd | 45 ++++++++++++++++++++++++----------- tests/data/ValidBaseTypes.dmr | 41 +++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 14 deletions(-) create mode 100644 tests/data/ValidBaseTypes.dmr diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 9543d4e..d3cde02 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -1,6 +1,11 @@ - + jhrg 10/13/09 + + Schema expanded from template dap3.3.xsd + maju 11/24/2025 + --> + - + + + + @@ -53,7 +66,7 @@ - + @@ -85,7 +98,9 @@ - + + + @@ -95,7 +110,7 @@ - + @@ -208,8 +223,10 @@ + + - + @@ -219,8 +236,9 @@ - + + @@ -263,7 +281,6 @@ - @@ -273,7 +290,7 @@ - + @@ -311,7 +328,8 @@ - + + @@ -321,7 +339,7 @@ - + @@ -366,7 +384,6 @@ - diff --git a/tests/data/ValidBaseTypes.dmr b/tests/data/ValidBaseTypes.dmr new file mode 100644 index 0000000..13081ad --- /dev/null +++ b/tests/data/ValidBaseTypes.dmr @@ -0,0 +1,41 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + From 9fb7d2bc7306ec87313b9882789cb2e488bd3772 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Mon, 24 Nov 2025 16:51:17 -0800 Subject: [PATCH 26/35] test for all value types in Attributes definitions --- tests/data/Attributes_BaseTypes.dmr | 45 +++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 tests/data/Attributes_BaseTypes.dmr diff --git a/tests/data/Attributes_BaseTypes.dmr b/tests/data/Attributes_BaseTypes.dmr new file mode 100644 index 0000000..31a87a8 --- /dev/null +++ b/tests/data/Attributes_BaseTypes.dmr @@ -0,0 +1,45 @@ + + + + 1 + + + 1 + + + 1 + + + 1 + + + 1 + + + 1 + + + 1 + + + 1 + + + 1 + + + 1 + + + 1 + + + 1 + + + Data + + + URL here + + From a659d9c1595cffb38bcdffbe2d92b906f081d4b1 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 11:39:49 -0800 Subject: [PATCH 27/35] update spec to reflect used elements --- dap4/dap4.xsd | 108 ++++++++------------------------------------------ 1 file changed, 17 insertions(+), 91 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index d3cde02..dde253e 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -9,7 +9,7 @@ it\'s easier for processing software to figure out the type of an array; Grids have been generalized so that there can be any number of \'Array\' parts (and the Maps may be multi-dimensional; and OtherXML has been renamed - AnyXML and made its own element (it\'s no longer a type of \'Attribute\'). + OtherXML and made its own element (it\'s no longer a type of \'Attribute\'). The change to otherXML was made to simplify writing the schema since it appears schema 1.0 cannot reprsent a syntax where the type of an element @@ -45,7 +45,6 @@ - @@ -68,12 +67,10 @@ - - - - + + @@ -135,7 +128,7 @@ - + @@ -147,7 +140,7 @@ - + A Group is a lexical scoping tool used to replicate HDF5 and netCDF4 Groups. Each Group defines a lexical scope. Each dataset has at least one Group; if @@ -159,21 +152,12 @@ - + - - - This holds a dimension, a name and size, that may be shared between - Grids and/or Arrays. SharedDimensions are lexically scoped. - - - - - This holds a dimension, a name and size, that may be shared between @@ -200,7 +184,7 @@ - + @@ -208,19 +192,11 @@ - - - - - - - - @@ -239,12 +215,12 @@ - + - + When we want to embed arbitrary XML in a DDX use this node. This functions like an attribute and appear in the same general place as an attribute, @@ -265,66 +241,18 @@ - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -353,9 +281,7 @@ - - - + @@ -364,7 +290,7 @@ - + @@ -373,7 +299,7 @@ - + @@ -417,8 +343,8 @@ - - + + From 3c27b6e7794a8f7ab49568799fc23c68aaf25a7c Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 11:55:26 -0800 Subject: [PATCH 28/35] rename DimRefType to DimType --- dap4/dap4.xsd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index dde253e..cd22e4e 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -82,7 +82,7 @@ - + @@ -167,7 +167,7 @@ - + From a12383eb05abc845690a7044386d6bcaa1c54b90 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 12:20:03 -0800 Subject: [PATCH 29/35] slim down MapType to only define name as an attribute. Nothing else --- dap4/dap4.xsd | 32 -------------------------------- 1 file changed, 32 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index cd22e4e..c4b2b6b 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -253,41 +253,9 @@ - - - - - - - - - - - - - - - - - - - - What this does not capture is that a Map appears both at the start of - a DAP4 Grid and must bind a name for the Map to be used within the Grid to either a - SharedDimension or a size. However, the Map element also appears within the Array - and there only with the name attribute. - - - - - - - - - From 0fa762cb620fa1c921e47e2ee1c60f5a70192619 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 12:21:34 -0800 Subject: [PATCH 30/35] add comment about MapType --- dap4/dap4.xsd | 1 + 1 file changed, 1 insertion(+) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index c4b2b6b..1aa3dd1 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -255,6 +255,7 @@ + From 154ecb9d3aa94b267a94c96a0b05052de88ca30a Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 12:22:27 -0800 Subject: [PATCH 31/35] remove unused ArrayDimension --- dap4/dap4.xsd | 7 ------- 1 file changed, 7 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 1aa3dd1..2a4a8c9 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -247,13 +247,6 @@ - - - - - - - From a43fa93ade7c85410d099e3341d8f873a31e8b85 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 12:24:16 -0800 Subject: [PATCH 32/35] add OtherXML to be possible to be defined within an Attribute --- dap4/dap4.xsd | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 2a4a8c9..403cfd8 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -184,7 +184,6 @@ - @@ -215,14 +214,14 @@ - + - When we want to embed arbitrary XML in a DDX use this node. This + When we want to embed arbitrary XML in a DMR use this node. This functions like an attribute and appear in the same general place as an attribute, but its contents are ignored by DAP software. Other software might find the information useful. From 171cf8d802d1d7ac38776552fe968a8cc29422d9 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 12:27:45 -0800 Subject: [PATCH 33/35] add comment atop element declaration --- dap4/dap4.xsd | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index 403cfd8..bbb8a35 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -107,14 +107,13 @@ - - + - - + + @@ -304,7 +303,6 @@ - From 74c9501d28675eeffbcfc5166e8c7629013d5855 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 12:47:33 -0800 Subject: [PATCH 34/35] last comments --- dap4/dap4.xsd | 1 - 1 file changed, 1 deletion(-) diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index bbb8a35..db17d61 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -289,7 +289,6 @@ - From 0c453113011d6b8624cce04a54db710f29450ab6 Mon Sep 17 00:00:00 2001 From: Mikejmnez Date: Wed, 26 Nov 2025 13:15:55 -0800 Subject: [PATCH 35/35] define a self-contained OpaqueType and test --- dap4/dap4.xsd | 6 +++++- tests/data/OpaqueTest.dmr | 9 +++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 tests/data/OpaqueTest.dmr diff --git a/dap4/dap4.xsd b/dap4/dap4.xsd index db17d61..b79155d 100644 --- a/dap4/dap4.xsd +++ b/dap4/dap4.xsd @@ -75,7 +75,7 @@ - + @@ -173,6 +173,10 @@ + + + + DAP Attribute Type diff --git a/tests/data/OpaqueTest.dmr b/tests/data/OpaqueTest.dmr new file mode 100644 index 0000000..26fe96a --- /dev/null +++ b/tests/data/OpaqueTest.dmr @@ -0,0 +1,9 @@ + + + + + + + \ No newline at end of file