-
Notifications
You must be signed in to change notification settings - Fork 201
id generation for data/metadata in CSV files #9
Description
We may want to indicate how to generate IDs for the triples corresponding to the rows in CSV files.
This will facilitate having a well defined mapping from DSPL 2 datasets to triples, and may make it feasible to use dimension values and footnotes defined in CSV files across datasets.
Tentative proposal
Attempt to generate easy-to-keep-unique IDs, and make no provisions for ID collisions.
codeList
For each CSV row,
- Start with the containing dimension's ID.
- If there is no fragment, set the fragment to the dimension's
name, URL encoded. - Append an
=and the URL-encodedcodeValueto the fragment.
For example, if a row's codeValue is us and its containing Dimension has @id of #country, the row's triples should be generated as if from equivalent JSON-LD with "@id": "#country=us".
footnote
For each CSV row,
- Start with the containing
StatisticalDataset's@id. - If there is a fragment, append a
/ - Append
footnote=and the URL-encodedcodeValueto the fragment
For example, if the dataset's @id is the empty string, a footnote with codeValue of p would yield an ID of #footnote=p. Similarly, if the dataset @id is #my_dataset, the footnote would have @id of #my_dataset/footnote=p.
observation
For each CSV row,
- Start with the slice's
@id. - If there is a fragment, append a
/to it. - Sort the dimension values by dimension
name. - For each dimension value, append the URL-encoded
name,=and the URL-encodedcodeValueto the fragment, separating the entries with/. - Sort the measure values by measure name
- For each measure value, append the URL-encoded
nameto the fragment, separating entries with/.
For example, an observation in a slice with an @id of #europe_unemployment_slice with dimensions
genderofm,countryofuk, andmonthof2010-10
and measures
unemployment_rateandunemployment
would have an @id of #europe_unemployment_slice/country=uk/gender=m/month=2010-10/unemployment/unemployment_rate