From efcf7d04a43d9722eaddc3a45e12f4edd1728e6f Mon Sep 17 00:00:00 2001 From: tejas Date: Fri, 12 Dec 2025 16:59:20 +0100 Subject: [PATCH 1/5] added mkdocs --- docs/about.md | 30 ++++++++++++++++++++++++++++ docs/cli.md | 30 ++++++++++++++++++++++++++++ docs/configuration.md | 37 +++++++++++++++++++++++++++++++++++ docs/examples.md | 5 +++++ docs/getting-started.md | 43 +++++++++++++++++++++++++++++++++++++++++ docs/index.md | 39 +++++++++++++++++++++++++++++++++++++ docs/python-api.md | 23 ++++++++++++++++++++++ 7 files changed, 207 insertions(+) create mode 100644 docs/about.md create mode 100644 docs/cli.md create mode 100644 docs/configuration.md create mode 100644 docs/examples.md create mode 100644 docs/getting-started.md create mode 100644 docs/index.md create mode 100644 docs/python-api.md diff --git a/docs/about.md b/docs/about.md new file mode 100644 index 0000000..0a61ced --- /dev/null +++ b/docs/about.md @@ -0,0 +1,30 @@ +# About + +## Changelog +See `CHANGES.md`. + +## Reporting issues +Open an issue at https://github.com/deepesdl/deep-code/issues. + +## Contributions +PRs are welcome. Please follow the code style (black/ruff) and add tests where relevant. + +## Development install +```bash +pip install -e .[dev] +pytest +pytest --cov=deep-code +black . +ruff check . +``` + +## Documentation commands (MkDocs) +```bash +pip install -e .[docs] # install mkdocs + theme +mkdocs serve # live preview at http://127.0.0.1:8000 +mkdocs build # build site into site/ +mkdocs gh-deploy --clean # publish to GitHub Pages +``` + +## License +MIT License. See `LICENSE`. diff --git a/docs/cli.md b/docs/cli.md new file mode 100644 index 0000000..14947ed --- /dev/null +++ b/docs/cli.md @@ -0,0 +1,30 @@ +# CLI + +## Generate configs +Create starter templates for both workflow and dataset: + +```bash +deep-code generate-config # writes to current directory +deep-code generate-config -o ./configs # custom output folder +``` + +## Publish metadata +Publish dataset, workflow, or both (default is both) to the target environment: + +```bash +deep-code publish dataset.yaml workflow.yaml # production (default) +deep-code publish dataset.yaml workflow.yaml -e staging # staging +deep-code publish dataset.yaml -m dataset # dataset only +deep-code publish workflow.yaml -m workflow # workflow only +deep-code publish --dataset-config ./ds.yaml --workflow-config ./wf.yaml -m all +``` + +Options: +- `--environment/-e`: `production` (default) | `staging` | `testing` +- `--mode/-m`: `all` (default) | `dataset` | `workflow` +- `--dataset-config` / `--workflow-config`: explicitly set paths and bypass auto-detection + +## How publishing works +1. Reads your configs and builds dataset STAC collections plus variable catalogs. +2. Builds workflow and experiment OGC API Records. +3. Forks/clones the target metadata repo (production, staging, or testing), commits generated JSON, and opens a pull request on your behalf. diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 0000000..12d6ffa --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,37 @@ +# Configuration + +## Dataset config (YAML) +```yaml +dataset_id: your-dataset.zarr +collection_id: your-collection +osc_themes: [cryosphere] +osc_region: global +dataset_status: completed # or ongoing/planned +documentation_link: https://example.com/docs +access_link: s3://bucket/your-dataset.zarr +``` + +## Workflow config (YAML) +```yaml +workflow_id: your-workflow +properties: + title: "My workflow" + description: "What this workflow does" + keywords: ["Earth Science"] + themes: ["cryosphere"] + license: proprietary + jupyter_kernel_info: + name: deepesdl-xcube-1.8.3 + python_version: 3.11 + env_file: https://example.com/environment.yml +jupyter_notebook_url: https://github.com/org/repo/path/to/notebook.ipynb +contact: + - name: Jane Doe + organization: Example Org + links: + - rel: about + type: text/html + href: https://example.org +``` + +More templates and examples live in `dataset_config.yaml`, `workflow_config.yaml`, and `example-config/`. diff --git a/docs/examples.md b/docs/examples.md new file mode 100644 index 0000000..bf3a596 --- /dev/null +++ b/docs/examples.md @@ -0,0 +1,5 @@ +# Examples + +- Templates: `dataset_config.yaml`, `workflow_config.yaml` +- Example configs: `examples/example-config/` +- Notebooks on publishing: `examples/notebooks` diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 0000000..c21e12b --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,43 @@ +# Getting Started + +When working with cloud platforms like DeepESDL, workflow outputs typically live in S3 object storage. Before the project ends or once datasets and workflows are finalized, move the datasets into the ESA Project Results Repository (PRR) and publish the workflows (Jupyter notebooks) to a publicly accessible GitHub repository. The notebook path becomes an input in the dataset config file. + +Use the EarthCODE Project Results Repository to publish and preserve outputs from ESA-funded Earth observation projects. It is professionally maintained, FAIR-aligned, and keeps your results findable, reusable, and citable for the long term—no storage, operations, or access headaches. + +To transfer datasets into the ESA PRR, contact the DeepESDL platform team at [esdl-support@brockmann-consult.de](mailto:esdl-support@brockmann-consult.de). + +In the near future, `deep-code` will include built-in support for uploading your results to the ESA PRR as part of the publishing workflow, making it seamless to share your scientific contributions with the community. + +## Requirements +- Python 3.10+ +- GitHub token with access to the target EarthCODE metadata repo. +- Input configuration files. +- Datasets which needs to be published is uploaded to S3 like object storage and made publicly accessible. + +## Install +```bash +pip install deep-code +``` + +## Authentication +The CLI or the Python API reads GitHub credentials from a `.gitaccess` file in the directory where you run the command: + +1. **Generate a GitHub Personal Access Token (PAT)** + + 1. Navigate to GitHub → Settings → Developer settings → Personal access tokens. + 2. Click “Generate new token”. + 3. Choose the following scopes to ensure full access: + - repo (Full control of repositories — includes fork, pull, push, and read) + 4. Generate the token and copy it immediately — GitHub won’t show it again. + +2. **Create the .gitaccess File** + + Create a plain text file named .gitaccess in your project directory or home folder: + + ``` + github-username: your-git-user + github-token: your-personal-access-token + ``` + Replace your-git-user and your-personal-access-token with your actual GitHub username and token. + + diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..8101b37 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,39 @@ +# Overview + +`deep-code` is a lightweight Python CLI and API that publishes DeepESDL datasets and workflows as EarthCODE Open Science Catalog metadata. It can generate starter configs, build STAC collections and OGC API records, and open pull requests to the target EarthCODE metadata repository (production, staging, or testing). + +## Features +- Generate starter dataset and workflow YAML templates. +- Publish dataset collections, workflows, and experiments via a single command. +- Build STAC collections and catalogs for Datasets and their corresponding variables automatically from the dataset metadata. +- Build OGC API records for Workflows and Experiments from your configs. +- Flexible publishling targets i.e production/staging/testing EarthCODE metadata repositories with GitHub automation. + +```mermaid +%%{init: {'flowchart': {'nodeSpacing': 110, 'rankSpacing': 160}, 'themeVariables': {'fontSize': '28px', 'lineHeight': '1.6em'}}}%% +flowchart LR + subgraph User + A["Config files
(dataset.yaml, workflow.yaml)"] + B["deep-code CLI
(generate-config, publish)"] + end + + subgraph App["deep-code internals"] + C["Publisher
(mode: dataset/workflow/all)"] + D["STAC builder
OscDatasetStacGenerator"] + E["OGC record builder
OSCWorkflowOGCApiRecordGenerator"] + F["GitHubAutomation
(fork, clone, branch, PR)"] + end + + subgraph Output + G["Generated JSON
collections, variables,
workflows, experiments"] + H["GitHub PR
(prod/staging/testing repo)"] + I["EarthCODE Open Science Catalog"] + end + + A --> B --> C + C --> D + C --> E + D --> G + E --> G + G --> F --> H --> I +``` diff --git a/docs/python-api.md b/docs/python-api.md new file mode 100644 index 0000000..9065ca0 --- /dev/null +++ b/docs/python-api.md @@ -0,0 +1,23 @@ +# Python API + +`deep_code.tools.publish.Publisher` is the main entry point. + +```python +from deep_code.tools.publish import Publisher + +publisher = Publisher( + dataset_config_path="dataset.yaml", + workflow_config_path="workflow.yaml", + environment="staging", +) + +# Generate files locally (no PR) +publisher.publish(write_to_file=True, mode="all") + +# Or open a PR directly +publisher.publish(write_to_file=False, mode="dataset") +``` + +Other utilities: +- `deep_code.tools.new.TemplateGenerator` for programmatic template generation. +- `deep_code.utils.dataset_stac_generator.OscDatasetStacGenerator` and `deep_code.utils.ogc_record_generator.OSCWorkflowOGCApiRecordGenerator` for lower-level metadata building. From f1421a9ae2699f30f8f23a53c1a6e2d543883f50 Mon Sep 17 00:00:00 2001 From: tejas Date: Fri, 12 Dec 2025 17:00:02 +0100 Subject: [PATCH 2/5] updated pyproject with docs section --- pyproject.toml | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/pyproject.toml b/pyproject.toml index 042038a..1fd50b4 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -50,6 +50,13 @@ dev = [ "pytest-cov", "pytest-recording" ] +docs = [ + "mkdocs>=1.5", + "mkdocs-autorefs", + "mkdocs-material>=9.5", + "mkdocstrings", + "mkdocstrings-python" +] # entry point CLI [project.scripts] From 6aea65207dc977c1a21a3daba1fd96830b5a0901 Mon Sep 17 00:00:00 2001 From: tejas Date: Fri, 12 Dec 2025 17:07:23 +0100 Subject: [PATCH 3/5] updated README.md --- README.md | 187 ++++-------------------------------------------------- 1 file changed, 13 insertions(+), 174 deletions(-) diff --git a/README.md b/README.md index 68803f2..32792b8 100644 --- a/README.md +++ b/README.md @@ -5,177 +5,16 @@ [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![License](https://img.shields.io/github/license/dcs4cop/xcube-smos)](https://github.com/deepesdl/deep-code/blob/main/LICENSE) -`deep-code` is a lightweight python tool that comprises a command line interface(CLI) -and Python API providing utilities that aid integration of DeepESDL datasets, -experiments with EarthCODE. - -The first release will focus on implementing the publish feature of DeepESDL -experiments/workflow as OGC API record and Datasets as an OSC stac collection. - -## Setup - -## Install -`deep-code` will be available in PyPI for now and will be available in conda-forge -in the near future. Till the stable release, -developers/contributors can follow the below steps to install deep-code. - -## Installing from the repository for Developers/Contributors - -To install deep-code directly from the git repository, clone the repository, and execute the steps below: - -```commandline -conda env create -conda activate deep-code -pip install -e . -``` - -This installs all the dependencies of `deep-code` into a fresh conda environment, -and installs deep-code from the repository into the same environment. - -## Testing - -To run the unit test suite: - -```commandline -pytest -``` - -To analyze test coverage -```shell -pytest --cov=deep-code -``` - -To produce an HTML coverage report - -```commandline -pytest --cov-report html --cov=deep-code -``` - -## deep_code usage - -`deep_code` provides a command-line tool called deep-code, which has several subcommands -providing different utility functions. -Use the --help option with these subcommands to get more details on usage. - -The CLI retrieves the Git username and personal access token from a hidden file named -.gitaccess. Ensure this file is located in the same directory where you execute the CLI -command. - -#### .gitaccess example - -``` -github-username: your-git-user -github-token: personal access token -``` -### deep-code generate-config - -Generates starter configuration templates for publishing to EarthCODE openscience -catalog. - -#### Usage -``` -deep-code generate-config [OPTIONS] -``` - -#### Options - --output-dir, -o : Output directory (default: current) - -#### Examples: -``` -deep-code generate-config -deep-code generate-config -o ./configs -``` - -### deep-code publish - -Publishes metadata of experiment, workflow and dataset to the EarthCODE open-science -catalog - -### Usage -``` -deep-code publish DATASET_CONFIG WORKFLOW_CONFIG [--environment ENVIRONMENT] [--mode -all|dataset|workflow] - ``` - -#### Arguments - DATASET_CONFIG - Path to the dataset configuration YAML file - (e.g., dataset-config.yaml) - - WORKFLOW_CONFIG - Path to the workflow configuration YAML file - (e.g., workflow-config.yaml) - -#### Options - --dataset-config, - Explict path to dataset config - --workflow-config, - Explicit path to workflow config - --environment, -e - Target catalog environment: - production (default) | staging | testing - --mode, -m Publishing mode: - all (default) | dataset | workflow - -#### Examples: -1. Publish to staging catalog -``` -deep-code publish dataset-config.yaml workflow-config.yaml --environment=staging -``` -2. Publish to testing catalog -``` -deep-code publish dataset-config.yaml workflow-config.yaml -e testing -``` -3. Publish to production catalog -``` -deep-code publish dataset-config.yaml workflow-config.yaml -``` -4. Publish Dataset only -``` -deep-code publish dataset-config.yaml -m dataset - -deep-code publish --dataset-config dataset.yaml -m dataset -``` -5. Publish Workflow only -``` -deep-code publish workflow-config.yaml -m workflow - -deep-code publish --workflow-config workflow.yaml -m workflow -``` -#### dataset-config.yaml example - -``` -dataset_id: esa-cci-permafrost-1x1151x1641-1.0.0.zarr -collection_id: esa-cci-permafrost -osc_themes: - - cryosphere -osc_region: global -# non-mandatory -documentation_link: https://deepesdl.readthedocs.io/en/latest/datasets/esa-cci-permafrost-1x1151x1641-0-0-2-zarr -access_link: s3://deep-esdl-public/esa-cci-permafrost-1x1151x1641-1.0.0.zarr -dataset_status: completed -``` - -dataset-id has to be a valid dataset-id from `deep-esdl-public` s3 bucket or your team -bucket. - -#### workflow-config.yaml example - -``` -workflow_id: "esa-cci-permafrost" -properties: - title: "ESA CCI permafrost" - description: "cube generation workflow for esa-cci-permafrost" - keywords: - - Earth Science - themes: - - cryosphere - license: proprietary - jupyter_kernel_info: - name: deepesdl-xcube-1.8.3 - python_version: 3.11 - env_file: "https://github.com/deepesdl/cube-gen/blob/main/Permafrost/environment.yml" -jupyter_notebook_url: "https://github.com/deepesdl/cube-gen/blob/main/Permafrost/Create-CCI-Permafrost-cube-EarthCODE.ipynb" -contact: - - name: Tejas Morbagal Harish - organization: Brockmann Consult GmbH - links: - - rel: "about" - type: "text/html" - href: "https://www.brockmann-consult.de/" -``` +`deep-code` is a lightweight Python CLI and API that turns DeepESDL datasets and +workflows into EarthCODE Open Science Catalog metadata. It can generate starter configs, +build STAC collections and OGC API records, and open pull requests to the target +EarthCODE metadata repository (production, staging, or testing). + +## Features +- Generate starter dataset and workflow YAML templates. +- Publish dataset collections, workflows, and experiments via a single command. +- Build STAC collections and catalogs for Datasets and their corresponding variables + automatically from the dataset metadata. +- Build OGC API records for Workflows and Experiments from your configs. +- Flexible publishling targets i.e production/staging/testing EarthCODE metadata + repositories with GitHub automation. From c376f9405da605aa2d83e279cbf4df549673da13 Mon Sep 17 00:00:00 2001 From: tejas Date: Fri, 12 Dec 2025 17:10:23 +0100 Subject: [PATCH 4/5] updated change log --- CHANGES.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index 5e1d135..0000658 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -55,6 +55,8 @@ - Prevented duplicate item and self links when updating base catalogs of workflows and experiments. -## Changes in 0.1.7 (in development) +## Changes in 0.1.7 -- Fixed a bug in build_child_link_to_related_experiment for the publish mode `"all"`. \ No newline at end of file +- Fixed a bug in build_child_link_to_related_experiment for the publish mode `"all"`. + +## Changes in 0.1.8 (in Development) From 67d2faa2c065eeb9be4ce4719379d59361318da5 Mon Sep 17 00:00:00 2001 From: tejas Date: Fri, 12 Dec 2025 17:11:52 +0100 Subject: [PATCH 5/5] updated docs --- docs/python-api.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/python-api.md b/docs/python-api.md index 9065ca0..2695cbb 100644 --- a/docs/python-api.md +++ b/docs/python-api.md @@ -18,6 +18,3 @@ publisher.publish(write_to_file=True, mode="all") publisher.publish(write_to_file=False, mode="dataset") ``` -Other utilities: -- `deep_code.tools.new.TemplateGenerator` for programmatic template generation. -- `deep_code.utils.dataset_stac_generator.OscDatasetStacGenerator` and `deep_code.utils.ogc_record_generator.OSCWorkflowOGCApiRecordGenerator` for lower-level metadata building.