-
Notifications
You must be signed in to change notification settings - Fork 0
Description
TDIG "review" (Fernique - vice-chair) of DataOrigin 1.2 document
available on GitHub at the address https://github.com/ivoa-std/DataOrigin and submitted to be an "Endorsed Note by their authors (G.Landais, A.Muench, M.Demleitner, R.Savalle).
-
I am a bit uncomfortable doing a review via GitHub. It is not really designed for this purpose. Should I open an issue per remark ? Seem to be no usefull. -
→ How does this work articulate with other efforts on Provenance, in particular the Provenance Data Model? Or with the work done around DataCite?
Abstract: “_Data Origin in the VO specifies a set of metadata items that define basic provenance information_” -
The purpose of this Endorsed Note : "presented as a list of “_metadata items_” intended to provide a simple description of provenance by annotating the VOTable that packages the data via INFO XML tags. The stated use cases are descriptions of: data origin, reproducibility, citations, and bibliography." The objective is very interesting, will be useful, and will fill a "need" in the IVOA system. -
In section 3, page 5, I do not understand the argument that the VO Registry cannot be used for this purpose because of a lack of “_central curators_” (page 6). Whether curators are centralized or not has no real impact. Each publisher remains responsible for its own records, whether they are done well or poorly. I have the same difficulty with the statement “_this IVOID is not suitable as a means of citations_”. Why is that? Perhaps I am missing something. -
Page 7, end of the first paragraph of Section 4: _“a serialisation directly into the VO output is desirable_”. Typically, HiPS tiles are VO outputs, and I do not really see the relation here. What exactly is meant by a “VO output”? Is it larger than a VOTable resulting from a query to a VO protocol? -
Is it a recommended new vocabulary, resulting from an aggregation of two sources: DALI and Dublin Core, with extensions specific to this document ? Is that correct? And if so, is it normative? -
Page 9: why was the VOTable serialization example relegated to an appendix? If it is normative, it should be in the main body of the text (see point 10 below) -
Still on page 9, the paragraph describing the difficulties related to ADQL joins merely raises the problem. Since no solution is proposed, this feels odd. It might be better not to mention this difficulty at all. -
In the serialization example in Appendix A, we see repeated use of the INFO element. There is no structure or hierarchy. The examples show that keywords are given via "name" attribute, values are given as "value" attribute of the INFO tag, and that the element content is used only for a textual description. However, this is described nowhere in the document. -
I did not understand Appendix C on the “Citation Template”. It lacks explanations. -
The same comment for Appendix D. It looks like an unfinished section. For example, VOPROV is mentioned without any reference. What is this Python package?
Overall, I am really questioning the standardization process via an “Endorsed Note” for this document. In its current state, the document looks much more like an regular IVOA Note. If there is a real intention to standardize both the vocabulary and the serialization, it would be far more reasonable to use the classical process WD → PR → REC. This would provide the appropriate technical means for a proper review (RFC page or similar), and sufficient guarantees (reference implementations, validators, etc.) to ensure long-term interoperability.