Skip to content
This repository was archived by the owner on Sep 28, 2022. It is now read-only.
This repository was archived by the owner on Sep 28, 2022. It is now read-only.

NodeFormatterTTL escapes UTF8 chars in URLs #16

@keski

Description

@keski

When formatting IRIs that use the base prefix, the NodeFormatterTTL transforms all values into percent encoded ASCII strings. Internally, however, the TurtleStar parser codes these IRIs correctly as UTF8. For example,

<<<Bernard_Frénicle_de_Bessy> <hasGivenName> "Bernard">> <extractionSource> <yagoTheme_yagoTransitiveType> .

will be converted into:

_:B14ae0638X2Dfb06X2D43a2X2D9302X2Dac581855334f rdf:type rdf:Statement ;
                                        rdf:subject <Bernard_Fr%C3%A9nicle_de_Bessy> ;
                                        rdf:predicate <hasGivenName> ;
                                        rdf:object "Bernard" ;
                                        <extractionSource> <yagoTheme_yagoTransitiveType> .

and converting this back to Turtle* will then yield:

<<<Bernard_Fr%C3%A9nicle_de_Bessy> <hasGivenName> "Bernard">> <extractionSource> <yagoTheme_yagoTransitiveType> . 

and the resources will thus not be equivalent.

I'm not sure how this issue is most easily resolved. We could of course provide a separate node formatter implementation, but it seems like there should be some flag or other to fix this...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions