Export Requirements by hartig · Pull Request #9 · awslabs/SPARQL-CDTs

hartig · 2024-06-04T07:41:23Z

This PR adds the export requirements that we started working on in PR #2

The initial version of this PR is simply a copy of the text that we already had in PR #2. Primarily, there are still the following two points that still need to be resolved.

While I generally agree with the new third bullet point in the given list of requirements, this point feels somewhat detached from the context in which these bullet points are listed. I mean, the context is that there is some substring "_:id" in the lexical form of some cdt:List or cdt:Map literal to be exported, but this bullet point does not talk at all about this substring / about id.
Related to the previous point, that third bullet point seems to assume systems that represent CDT literals internally based on their value form (how would such a system otherwise know that it is the same blank node B that is both inside and outside of CDT literals?). So, the issue that I see here is that it is not clear how the requirement of this bullet point can apply to systems that represent CDT literals internally based on their lexical form.

kasei · 2024-06-06T15:04:04Z

Regarding your point 1, I understand your concern. Personally, I'm not that bothered by the current wording, but I'll try to update the wording to explicitly mention the bnode token substrings.

For point 2:

how would such a system otherwise know that it is the same blank node B that is both inside and outside of CDT literals?

Wouldn't any system be able to know this by inspection (either directly to a value, or by using the lexical-to-value mapping)? This part of the process seems trivial to me (unless I'm missing something?) due to the lexical-to-value requirements, which is what caused us to have to define these import/export requirements in the first place. Before the export rewriting, a lexical form "[ _:b ]" should be enough to know that the bnode in the list is the same as some bnode, b, outside of a CDT literal because bnl2bn("_:b") = b.

hartig · 2024-06-23T20:58:38Z

Regarding your point 1, I understand your concern. Personally, I'm not that bothered by the current wording, but I'll try to update the wording to explicitly mention the bnode token substrings.

Okay, thanks.

For point 2:

how would such a system otherwise know that it is the same blank node B that is both inside and outside of CDT literals?

Wouldn't any system be able to know this by inspection (either directly to a value, or by using the lexical-to-value mapping)? This part of the process seems trivial to me (unless I'm missing something?) due to the lexical-to-value requirements, which is what caused us to have to define these import/export requirements in the first place. Before the export rewriting, a lexical form "[ _:b ]" should be enough to know that the bnode in the list is the same as some bnode, b, outside of a CDT literal because bnl2bn("_:b") = b.

You are right. I was not taking into account that the CDT literals that have been loaded into a system have been subject to the bnode-identifier rewriting as per the import requirements in the previous section.

kasei · 2024-07-30T18:33:43Z

What do you think about this as new text for the third bullet point in the exporting requirements?

assuming that the format of the document being serialized has a mechanism for explicitly identifying blank nodes outside of cdt:List and cdt:Map literals (e.g., _:b1 syntax in N-Triples and in Turtle, rdf:nodeID in RDF/XML),
if the blank node B = bnl2bn("_:" + id) is contained in the data to be serialized outside of composite value literal,
then the serializer for this document format MUST serialize B using the blank node identifier b both inside and outside of composite values

…ment)

hartig · 2024-08-13T15:53:16Z

What do you think about this as new text for the third bullet point in the exporting requirements?
[...]

Sounds good. I have copied this text directly into this PR (see commit 8f266ae). Thereafter, I have changed the whole paragraph of the export requirements a bit to be better aligned with the writing style used for the import requirements; see commit b2e1420 for these extra changes. Please check whether the whole part of the bullet points for the export requirements is still consistent with your intuition.

kasei · 2024-08-19T00:37:19Z

Text as of b2e1420 looks good.

kasei · 2024-08-19T00:48:33Z

I don't want this to turn into a permanently ongoing issue, but I have one more concern about the current text. The opening text now says:

Conforming implementations MUST process cdt:List literals and cdt:Map literals during export, replacing, in their lexical form, any substring bnl matching the BLANK_NODE_LABEL production of the grammar with a string bnl' such that …

Is that wording too vague about how a substring might match that production? I'm thinking in particular about substrings that might appear in a literal member of a list or map like "['a blank node _:a']"^^cdt:List. In this case, _:a is a substring of the CDT literal that matches BLANK_NODE_LABEL, even though you'd never parse it that way because it is internal textual data that would be parsed as part of RDFLiteral.

On the other hand, if you ignore the internal textual content of RDFLiteral values of CDTs, what should happen with something like "[ '[_:a]'^^<http://w3id.org/awslabs/neptune/SPARQL-CDTs/List> ]"^^cdt:List? That's a blank node in a list in a list (that could alternatively be written "[ [_:a] ]"^^cdt:List), and I think it should be rewritten as part of the export requirements. Maybe we need some extra text that ensures the current requirements apply to some sort of canonical form of a CDT that always uses the CDT syntax for sub-CDTs, instead of having them as datatypes literals…?

hartig · 2024-08-19T10:34:49Z

Conforming implementations MUST process cdt:List literals and cdt:Map literals during export, replacing, in their lexical form, any substring bnl matching the BLANK_NODE_LABEL production of the grammar with a string bnl' such that …

Is that wording too vague about how a substring might match that production? I'm thinking in particular about substrings that might appear in a literal member of a list or map like "['a blank node _:a']"^^cdt:List. In this case, _:a is a substring of the CDT literal that matches BLANK_NODE_LABEL, even though you'd never parse it that way because it is internal textual data that would be parsed as part of RDFLiteral.

Good point! How's about the following?

"Conforming implementations MUST process cdt:List literals and cdt:Map literals during export as follows. In the lexical form of these literals, every substring bnl that would match the BLANK_NODE_LABEL production when parsing the whole lexical form MUST be replaced by a string bnl' such that ..."

On the other hand, if you ignore the internal textual content of RDFLiteral values of CDTs, what should happen with something like "[ '[_:a]'^^<http://w3id.org/awslabs/neptune/SPARQL-CDTs/List> ]"^^cdt:List? That's a blank node in a list in a list (that could alternatively be written "[ [_:a] ]"^^cdt:List), and I think it should be rewritten as part of the export requirements.

Oh boy! Yes, nested CDT literals are another complication :-( Good observation!

Maybe we need some extra text that ensures the current requirements apply to some sort of canonical form of a CDT that always uses the CDT syntax for sub-CDTs, instead of having them as datatypes literals…?

That's an idea. I am just worried that it becomes a little complicated to write this in an understandable way. Another option may be to simply add the following parenthesis to the second of the bullet points below the opening text (the bullet point about all occurrences of bnl being replaced with the same bnl' value):

"(this applies recursively also to cdt:List and cdt:Map literals contained as elements within the term list or term map of other such literals)"

What do you think of this option?

By the way, both of these issues that you point out here apply to the import requirements as well.

kasei · 2024-08-25T18:46:20Z

Your proposed text regarding the parsing of bnl looks good.

I agree that your suggested parenthetical is a much better way to address the second issue.

…posed in #9 (comment)

hartig · 2024-08-26T13:22:50Z

Thanks for confirming! I have implemented both of the proposed changes (and both for import and export)---see commit 2946b91

With this, I think this PR is ready to be merged. Let me know whether you agree or want to add something else. If you agree, I will merge and then publish the current state as a new version.

uncomments the export requirements section again

67ba2df

hartig mentioned this pull request Jun 4, 2024

Add spec section on requirements for importing/exporting CDT values #2

Merged

hartig added 2 commits August 13, 2024 17:15

adapts the wording for the export requirements as proposed in #9 (com…

8f266ae

…ment)

edits of the export requirements

b2e1420

improves the text about the import and the export requirements as pro…

2946b91

…posed in #9 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export Requirements#9

Export Requirements#9
hartig wants to merge 4 commits intomainfrom
spec-export

hartig commented Jun 4, 2024

Uh oh!

kasei commented Jun 6, 2024

Uh oh!

hartig commented Jun 23, 2024

Uh oh!

kasei commented Jul 30, 2024

Uh oh!

hartig commented Aug 13, 2024

Uh oh!

kasei commented Aug 19, 2024

Uh oh!

kasei commented Aug 19, 2024

Uh oh!

hartig commented Aug 19, 2024

Uh oh!

kasei commented Aug 25, 2024

Uh oh!

hartig commented Aug 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hartig commented Jun 4, 2024

Uh oh!

kasei commented Jun 6, 2024

Uh oh!

hartig commented Jun 23, 2024

Uh oh!

kasei commented Jul 30, 2024

Uh oh!

hartig commented Aug 13, 2024

Uh oh!

kasei commented Aug 19, 2024

Uh oh!

kasei commented Aug 19, 2024

Uh oh!

hartig commented Aug 19, 2024

Uh oh!

kasei commented Aug 25, 2024

Uh oh!

hartig commented Aug 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants