Skip to content

Commit 34afd29

Browse files
committed
update docs
1 parent 688831f commit 34afd29

File tree

2 files changed

+165
-2
lines changed

2 files changed

+165
-2
lines changed

docs/commands/build.md

Lines changed: 102 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This document provides detailed information about the build commands available i
44

55
## Overview
66

7-
The `build` command group provides operations for generating derived artifacts from RO-Crates. These artifacts include datasheets, visualizations, and evidence graphs that make the RO-Crate content more accessible and understandable.
7+
The `build` command group provides operations for generating derived artifacts from RO-Crates and creating release packages. These artifacts include datasheets, visualizations, evidence graphs, and release RO-Crates that make the content more accessible and understandable.
88

99
```bash
1010
fairscape-cli build [COMMAND] [OPTIONS]
@@ -14,6 +14,7 @@ fairscape-cli build [COMMAND] [OPTIONS]
1414

1515
- [`datasheet`](#datasheet) - Generate an HTML datasheet for an RO-Crate
1616
- [`evidence-graph`](#evidence-graph) - Generate a provenance graph for a specific ARK identifier
17+
- [`release`](#release) - Build a release RO-Crate from a directory containing multiple RO-Crates
1718

1819
## Command Details
1920

@@ -100,3 +101,103 @@ The evidence graph shows:
100101
- All relevant metadata for each node in the graph
101102

102103
The HTML visualization provides an interactive graph that can be viewed in a web browser, making it easy to explore the provenance of datasets, software, and computations in the RO-Crate.
104+
105+
### `release`
106+
107+
Build a release RO-Crate in a directory, scanning for and linking existing sub-RO-Crates. This creates a parent RO-Crate that references and contextualizes the sub-crates.
108+
109+
```bash
110+
fairscape-cli build release [OPTIONS] RELEASE_DIRECTORY
111+
```
112+
113+
**Arguments:**
114+
115+
- `RELEASE_DIRECTORY` - Directory where the release RO-Crate will be built [required]
116+
117+
**Options:**
118+
119+
- `--guid TEXT` - GUID for the parent release RO-Crate (generated if not provided)
120+
- `--name TEXT` - Name for the parent release RO-Crate [required]
121+
- `--organization-name TEXT` - Organization name associated with the release [required]
122+
- `--project-name TEXT` - Project name associated with the release [required]
123+
- `--description TEXT` - Description of the release RO-Crate [required]
124+
- `--keywords TEXT` - Keywords for the release RO-Crate (can be used multiple times) [required]
125+
- `--license TEXT` - License URL for the release (default: "https://creativecommons.org/licenses/by/4.0/")
126+
- `--date-published TEXT` - Publication date (ISO format, defaults to current date)
127+
- `--author TEXT` - Author(s) of the release (defaults to combined authors from subcrates)
128+
- `--version TEXT` - Version of the release (default: "1.0")
129+
- `--associated-publication TEXT` - Associated publications for the release (can be used multiple times)
130+
- `--conditions-of-access TEXT` - Conditions of access for the release
131+
- `--copyright-notice TEXT` - Copyright notice for the release
132+
- `--doi TEXT` - DOI identifier for the release
133+
- `--publisher TEXT` - Publisher of the release
134+
- `--principal-investigator TEXT` - Principal investigator for the release
135+
- `--contact-email TEXT` - Contact email for the release
136+
- `--confidentiality-level TEXT` - Confidentiality level for the release
137+
- `--citation TEXT` - Citation for the release
138+
- `--funder TEXT` - Funder of the release
139+
- `--usage-info TEXT` - Usage information for the release
140+
- `--content-size TEXT` - Content size of the release
141+
- `--completeness TEXT` - Completeness information for the release
142+
- `--maintenance-plan TEXT` - Maintenance plan for the release
143+
- `--intended-use TEXT` - Intended use of the release
144+
- `--limitations TEXT` - Limitations of the release
145+
- `--prohibited-uses TEXT` - Prohibited uses of the release
146+
- `--potential-sources-of-bias TEXT` - Potential sources of bias in the release
147+
- `--human-subject TEXT` - Human subject involvement information
148+
- `--ethical-review TEXT` - Ethical review information
149+
- `--additional-properties TEXT` - JSON string with additional property values
150+
- `--custom-properties TEXT` - JSON string with additional properties for the parent crate
151+
152+
**Example:**
153+
154+
```bash
155+
fairscape-cli build release ./my_release \
156+
--guid "ark:59852/example-release-2023" \
157+
--name "SRA Genomic Data Example Release - 2023" \
158+
--organization-name "Example Research Institute" \
159+
--project-name "Genomic Data Analysis Project" \
160+
--description "This dataset contains genomic data from multiple sources prepared as AI-ready datasets in RO-Crate format." \
161+
--keywords "Genomics" \
162+
--keywords "SRA" \
163+
--keywords "RNA-seq" \
164+
--license "https://creativecommons.org/licenses/by/4.0/" \
165+
--publisher "University Example Dataverse" \
166+
--principal-investigator "Dr. Example PI" \
167+
--contact-email "example@example.org" \
168+
--confidentiality-level "HL7 Unrestricted" \
169+
--funder "Example Agency" \
170+
--citation "Example Research Institute (2023). Genomic Data Example Release."
171+
```
172+
173+
This command:
174+
175+
1. Creates a new parent RO-Crate in the specified directory
176+
2. Scans the directory for existing RO-Crates to include as subcrates
177+
3. Links the subcrates to the parent crate
178+
4. Combines metadata from subcrates and the provided options
179+
5. Outputs the ARK identifier of the created release RO-Crate
180+
181+
## Release Workflow
182+
183+
A typical release workflow involves:
184+
185+
1. **Create individual RO-Crates** for specific datasets, software, and computations
186+
2. **Place these RO-Crates** in a common directory structure
187+
3. **Build a release** using the `build release` command to create a parent RO-Crate
188+
4. **Generate a datasheet** using the `build datasheet` command
189+
5. **Publish the release** using the `publish` commands
190+
191+
The parent release RO-Crate provides context and relationships between the individual RO-Crates, making it easier to understand and work with complex datasets that span multiple files, processes, and research objects.
192+
193+
## Metadata Inheritance
194+
195+
When building a release, metadata is handled in the following ways:
196+
197+
- **Author information** is combined from all subcrates unless explicitly provided
198+
- **Keywords** include both the specified keywords and those from subcrates
199+
- **Version** defaults to "1.0" unless specified
200+
- **License** defaults to CC-BY 4.0 unless specified
201+
- **Publication date** defaults to the current date unless specified
202+
203+
All other metadata must be explicitly provided through the command options.

docs/commands/schema.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This document provides detailed information about the schema commands available
44

55
## Overview
66

7-
The `schema` command group provides operations for creating, modifying, and working with data schemas. Schemas describe the structure and constraints of datasets, enabling validation and improved interoperability.
7+
The `schema` command group provides operations for creating, modifying, working with data schemas, and validating data against schemas. Schemas describe the structure and constraints of datasets, enabling validation and improved interoperability.
88

99
```bash
1010
fairscape-cli schema [COMMAND] [OPTIONS]
@@ -21,6 +21,7 @@ fairscape-cli schema [COMMAND] [OPTIONS]
2121
- [`array`](#add-property-array) - Add an array property
2222
- [`infer`](#infer) - Infer a schema from a data file
2323
- [`add-to-crate`](#add-to-crate) - Add a schema to an RO-Crate
24+
- [`validate`](#validate) - Validate a dataset against a schema definition
2425

2526
## Command Details
2627

@@ -244,3 +245,64 @@ fairscape-cli schema add-to-crate \
244245
./my_rocrate \
245246
./schema_apms_music_embedding.json
246247
```
248+
249+
### `validate`
250+
251+
Validate a dataset against a schema definition.
252+
253+
```bash
254+
fairscape-cli schema validate [OPTIONS]
255+
```
256+
257+
**Options:**
258+
259+
- `--schema TEXT` - Path to the schema file or ARK identifier [required]
260+
- `--data TEXT` - Path to the data file to validate [required]
261+
262+
**Example:**
263+
264+
```bash
265+
fairscape-cli schema validate \
266+
--schema ./music_apms_embedding_schema.json \
267+
--data ./APMS_embedding_MUSIC.csv
268+
```
269+
270+
When validation succeeds, you'll see:
271+
272+
```
273+
Validation Success
274+
```
275+
276+
If validation fails, you'll see a table of errors:
277+
278+
```
279+
+-----+-----------------+----------------+-------------------------------------------------------+
280+
| row | error_type | failed_keyword | message |
281+
+-----+-----------------+----------------+-------------------------------------------------------+
282+
| 3 | ParsingError | None | ValueError: Failed to Parse Attribute embed for Row 3 |
283+
| 4 | ParsingError | None | ValueError: Failed to Parse Attribute embed for Row 4 |
284+
| 0 | ValidationError | pattern | 'APMS_A' does not match '^APMS_[0-9]*$' |
285+
+-----+-----------------+----------------+-------------------------------------------------------+
286+
```
287+
288+
## Error Types
289+
290+
Errors are categorized into two main types:
291+
292+
1. **ParsingError**: Occurs when the data cannot be parsed according to the schema structure. This often happens when:
293+
294+
- The number of columns doesn't match the schema
295+
- A value cannot be converted to the expected datatype
296+
297+
2. **ValidationError**: Occurs when the data can be parsed but fails validation constraints like:
298+
- String values not matching the specified pattern
299+
- Numeric values outside the min/max range
300+
- Array length not within specified bounds
301+
302+
## Working with Different File Types
303+
304+
The validation command automatically detects the file type based on its extension:
305+
306+
- **CSV/TSV files**: Tabular validation with field separators
307+
- **Parquet files**: Tabular validation with columnar storage
308+
- **HDF5 files**: Hierarchical validation with nested structures

0 commit comments

Comments
 (0)