-
Notifications
You must be signed in to change notification settings - Fork 15
Description
The code regarding the regular expressions used for rdf/iri contain mistakes:
iqueryREusesipathinstead ofiquery, which is thus unused; When replacing it, further mistakes withiniquerycome to light:iprivatecontains an invalid regexp sequence:\x{F0000]-\x{FFFFD}should be\x{F0000}-\x{FFFFD}.iqueryis wrongly using "/?" as a sequence; This should be a choice, as in[\/\?].
iuserinfois missing the colon character as per RFC. As such, IRI"https://user:pwd@example.com"cannot be parsed.h16regular expression should allow for 1-4 hex digits as per RFC, not require exactly 4 hex digits
As a side-note, the example "http://résumé.example.org", used for testing normalization, is not a properIRI string. The é sequence is according to RFC chapter 1.4 the way how non US-ASCII characters are represented within a US-ASCII-only RFC text.
The first # makes the remainder be considered a fragment, which would be invalid because of the second #.
I found these things as I was extracting the package as a separate library, handling all the TODOs (ending up in a large rework), and feeding in many samples from the RFC - especially those about resolving relative IRIs. See https://github.com/contomap/iri .
My rework makes it incompatible with your use in here (different type & behaviour), which is why I collect the mistakes I found only as an issue.