Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@
- Prevented duplicate item and self links when updating base catalogs of workflows and
experiments.

## Changes in 0.1.7 (in development)
## Changes in 0.1.7

- Fixed a bug in build_child_link_to_related_experiment for the publish mode `"all"`.
- Fixed a bug in build_child_link_to_related_experiment for the publish mode `"all"`.

## Changes in 0.1.8 (in Development)
187 changes: 13 additions & 174 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,177 +5,16 @@
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/github/license/dcs4cop/xcube-smos)](https://github.com/deepesdl/deep-code/blob/main/LICENSE)

`deep-code` is a lightweight python tool that comprises a command line interface(CLI)
and Python API providing utilities that aid integration of DeepESDL datasets,
experiments with EarthCODE.

The first release will focus on implementing the publish feature of DeepESDL
experiments/workflow as OGC API record and Datasets as an OSC stac collection.

## Setup

## Install
`deep-code` will be available in PyPI for now and will be available in conda-forge
in the near future. Till the stable release,
developers/contributors can follow the below steps to install deep-code.

## Installing from the repository for Developers/Contributors

To install deep-code directly from the git repository, clone the repository, and execute the steps below:

```commandline
conda env create
conda activate deep-code
pip install -e .
```

This installs all the dependencies of `deep-code` into a fresh conda environment,
and installs deep-code from the repository into the same environment.

## Testing

To run the unit test suite:

```commandline
pytest
```

To analyze test coverage
```shell
pytest --cov=deep-code
```

To produce an HTML coverage report

```commandline
pytest --cov-report html --cov=deep-code
```

## deep_code usage

`deep_code` provides a command-line tool called deep-code, which has several subcommands
providing different utility functions.
Use the --help option with these subcommands to get more details on usage.

The CLI retrieves the Git username and personal access token from a hidden file named
.gitaccess. Ensure this file is located in the same directory where you execute the CLI
command.

#### .gitaccess example

```
github-username: your-git-user
github-token: personal access token
```
### deep-code generate-config

Generates starter configuration templates for publishing to EarthCODE openscience
catalog.

#### Usage
```
deep-code generate-config [OPTIONS]
```

#### Options
--output-dir, -o : Output directory (default: current)

#### Examples:
```
deep-code generate-config
deep-code generate-config -o ./configs
```

### deep-code publish

Publishes metadata of experiment, workflow and dataset to the EarthCODE open-science
catalog

### Usage
```
deep-code publish DATASET_CONFIG WORKFLOW_CONFIG [--environment ENVIRONMENT] [--mode
all|dataset|workflow]
```

#### Arguments
DATASET_CONFIG - Path to the dataset configuration YAML file
(e.g., dataset-config.yaml)

WORKFLOW_CONFIG - Path to the workflow configuration YAML file
(e.g., workflow-config.yaml)

#### Options
--dataset-config, - Explict path to dataset config
--workflow-config, - Explicit path to workflow config
--environment, -e - Target catalog environment:
production (default) | staging | testing
--mode, -m Publishing mode:
all (default) | dataset | workflow

#### Examples:
1. Publish to staging catalog
```
deep-code publish dataset-config.yaml workflow-config.yaml --environment=staging
```
2. Publish to testing catalog
```
deep-code publish dataset-config.yaml workflow-config.yaml -e testing
```
3. Publish to production catalog
```
deep-code publish dataset-config.yaml workflow-config.yaml
```
4. Publish Dataset only
```
deep-code publish dataset-config.yaml -m dataset

deep-code publish --dataset-config dataset.yaml -m dataset
```
5. Publish Workflow only
```
deep-code publish workflow-config.yaml -m workflow

deep-code publish --workflow-config workflow.yaml -m workflow
```
#### dataset-config.yaml example

```
dataset_id: esa-cci-permafrost-1x1151x1641-1.0.0.zarr
collection_id: esa-cci-permafrost
osc_themes:
- cryosphere
osc_region: global
# non-mandatory
documentation_link: https://deepesdl.readthedocs.io/en/latest/datasets/esa-cci-permafrost-1x1151x1641-0-0-2-zarr
access_link: s3://deep-esdl-public/esa-cci-permafrost-1x1151x1641-1.0.0.zarr
dataset_status: completed
```

dataset-id has to be a valid dataset-id from `deep-esdl-public` s3 bucket or your team
bucket.

#### workflow-config.yaml example

```
workflow_id: "esa-cci-permafrost"
properties:
title: "ESA CCI permafrost"
description: "cube generation workflow for esa-cci-permafrost"
keywords:
- Earth Science
themes:
- cryosphere
license: proprietary
jupyter_kernel_info:
name: deepesdl-xcube-1.8.3
python_version: 3.11
env_file: "https://github.com/deepesdl/cube-gen/blob/main/Permafrost/environment.yml"
jupyter_notebook_url: "https://github.com/deepesdl/cube-gen/blob/main/Permafrost/Create-CCI-Permafrost-cube-EarthCODE.ipynb"
contact:
- name: Tejas Morbagal Harish
organization: Brockmann Consult GmbH
links:
- rel: "about"
type: "text/html"
href: "https://www.brockmann-consult.de/"
```
`deep-code` is a lightweight Python CLI and API that turns DeepESDL datasets and
workflows into EarthCODE Open Science Catalog metadata. It can generate starter configs,
build STAC collections and OGC API records, and open pull requests to the target
EarthCODE metadata repository (production, staging, or testing).

## Features
- Generate starter dataset and workflow YAML templates.
- Publish dataset collections, workflows, and experiments via a single command.
- Build STAC collections and catalogs for Datasets and their corresponding variables
automatically from the dataset metadata.
- Build OGC API records for Workflows and Experiments from your configs.
- Flexible publishling targets i.e production/staging/testing EarthCODE metadata
repositories with GitHub automation.
30 changes: 30 additions & 0 deletions docs/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# About

## Changelog
See `CHANGES.md`.

## Reporting issues
Open an issue at https://github.com/deepesdl/deep-code/issues.

## Contributions
PRs are welcome. Please follow the code style (black/ruff) and add tests where relevant.

## Development install
```bash
pip install -e .[dev]
pytest
pytest --cov=deep-code
black .
ruff check .
```

## Documentation commands (MkDocs)
```bash
pip install -e .[docs] # install mkdocs + theme
mkdocs serve # live preview at http://127.0.0.1:8000
mkdocs build # build site into site/
mkdocs gh-deploy --clean # publish to GitHub Pages
```

## License
MIT License. See `LICENSE`.
30 changes: 30 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# CLI

## Generate configs
Create starter templates for both workflow and dataset:

```bash
deep-code generate-config # writes to current directory
deep-code generate-config -o ./configs # custom output folder
```

## Publish metadata
Publish dataset, workflow, or both (default is both) to the target environment:

```bash
deep-code publish dataset.yaml workflow.yaml # production (default)
deep-code publish dataset.yaml workflow.yaml -e staging # staging
deep-code publish dataset.yaml -m dataset # dataset only
deep-code publish workflow.yaml -m workflow # workflow only
deep-code publish --dataset-config ./ds.yaml --workflow-config ./wf.yaml -m all
```

Options:
- `--environment/-e`: `production` (default) | `staging` | `testing`
- `--mode/-m`: `all` (default) | `dataset` | `workflow`
- `--dataset-config` / `--workflow-config`: explicitly set paths and bypass auto-detection

## How publishing works
1. Reads your configs and builds dataset STAC collections plus variable catalogs.
2. Builds workflow and experiment OGC API Records.
3. Forks/clones the target metadata repo (production, staging, or testing), commits generated JSON, and opens a pull request on your behalf.
37 changes: 37 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Configuration

## Dataset config (YAML)
```yaml
dataset_id: your-dataset.zarr
collection_id: your-collection
osc_themes: [cryosphere]
osc_region: global
dataset_status: completed # or ongoing/planned
documentation_link: https://example.com/docs
access_link: s3://bucket/your-dataset.zarr
```

## Workflow config (YAML)
```yaml
workflow_id: your-workflow
properties:
title: "My workflow"
description: "What this workflow does"
keywords: ["Earth Science"]
themes: ["cryosphere"]
license: proprietary
jupyter_kernel_info:
name: deepesdl-xcube-1.8.3
python_version: 3.11
env_file: https://example.com/environment.yml
jupyter_notebook_url: https://github.com/org/repo/path/to/notebook.ipynb
contact:
- name: Jane Doe
organization: Example Org
links:
- rel: about
type: text/html
href: https://example.org
```

More templates and examples live in `dataset_config.yaml`, `workflow_config.yaml`, and `example-config/`.
5 changes: 5 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Examples

- Templates: `dataset_config.yaml`, `workflow_config.yaml`
- Example configs: `examples/example-config/`
- Notebooks on publishing: `examples/notebooks`
43 changes: 43 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Getting Started

When working with cloud platforms like DeepESDL, workflow outputs typically live in S3 object storage. Before the project ends or once datasets and workflows are finalized, move the datasets into the ESA Project Results Repository (PRR) and publish the workflows (Jupyter notebooks) to a publicly accessible GitHub repository. The notebook path becomes an input in the dataset config file.

Use the EarthCODE Project Results Repository to publish and preserve outputs from ESA-funded Earth observation projects. It is professionally maintained, FAIR-aligned, and keeps your results findable, reusable, and citable for the long term—no storage, operations, or access headaches.

To transfer datasets into the ESA PRR, contact the DeepESDL platform team at [esdl-support@brockmann-consult.de](mailto:esdl-support@brockmann-consult.de).

In the near future, `deep-code` will include built-in support for uploading your results to the ESA PRR as part of the publishing workflow, making it seamless to share your scientific contributions with the community.

## Requirements
- Python 3.10+
- GitHub token with access to the target EarthCODE metadata repo.
- Input configuration files.
- Datasets which needs to be published is uploaded to S3 like object storage and made publicly accessible.

## Install
```bash
pip install deep-code
```

## Authentication
The CLI or the Python API reads GitHub credentials from a `.gitaccess` file in the directory where you run the command:

1. **Generate a GitHub Personal Access Token (PAT)**

1. Navigate to GitHub → Settings → Developer settings → Personal access tokens.
2. Click “Generate new token”.
3. Choose the following scopes to ensure full access:
- repo (Full control of repositories — includes fork, pull, push, and read)
4. Generate the token and copy it immediately — GitHub won’t show it again.

2. **Create the .gitaccess File**

Create a plain text file named .gitaccess in your project directory or home folder:

```
github-username: your-git-user
github-token: your-personal-access-token
```
Replace your-git-user and your-personal-access-token with your actual GitHub username and token.


39 changes: 39 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Overview

`deep-code` is a lightweight Python CLI and API that publishes DeepESDL datasets and workflows as EarthCODE Open Science Catalog metadata. It can generate starter configs, build STAC collections and OGC API records, and open pull requests to the target EarthCODE metadata repository (production, staging, or testing).

## Features
- Generate starter dataset and workflow YAML templates.
- Publish dataset collections, workflows, and experiments via a single command.
- Build STAC collections and catalogs for Datasets and their corresponding variables automatically from the dataset metadata.
- Build OGC API records for Workflows and Experiments from your configs.
- Flexible publishling targets i.e production/staging/testing EarthCODE metadata repositories with GitHub automation.

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 110, 'rankSpacing': 160}, 'themeVariables': {'fontSize': '28px', 'lineHeight': '1.6em'}}}%%
flowchart LR
subgraph User
A["Config files<br/>(dataset.yaml, workflow.yaml)"]
B["deep-code CLI<br/>(generate-config, publish)"]
end

subgraph App["deep-code internals"]
C["Publisher<br/>(mode: dataset/workflow/all)"]
D["STAC builder<br/>OscDatasetStacGenerator"]
E["OGC record builder<br/>OSCWorkflowOGCApiRecordGenerator"]
F["GitHubAutomation<br/>(fork, clone, branch, PR)"]
end

subgraph Output
G["Generated JSON<br/>collections, variables,<br/>workflows, experiments"]
H["GitHub PR<br/>(prod/staging/testing repo)"]
I["EarthCODE Open Science Catalog"]
end

A --> B --> C
C --> D
C --> E
D --> G
E --> G
G --> F --> H --> I
```
Loading