Vendor-Configurable Metadata Models #294

yarikoptic · 2025-04-04T20:25:30Z

to accomodate LINC and EMBER. We are yet to workout proper vendorization where a particular model, ideally not entire run time as we could do via env var, could be parametrized with a specific vendor information. I think that relaxing those regular expressions altogether would help to adopt dandi for custom deployments meanwhile.

@satra -- WDYT for a quick workaround like this? any side effects I am not foreseeing besides "looser" validation?

Also attn @aaronkanzer and @kabilar -- for LINC did you just keep DANDI identifiers?

DONEs:

actually may be I should add vendoring via a few env vars, then at least a specific deployment of dandi-archive could restrict more heavily for a specific instance
(IMHO very optional) allow for vendorization of the namespace - added new DANDI_NSKEY internal variable
- add support/testing for that in tests which ATM have "dandi:" hardcoded
add test cases to ensure that works for alternative identifiers
This PR includes the solution to Add a Non-Commerical License Option: CC BY-NC-SA 4.0 #302 by making supported licenses configurable by vendor. This PR closes Add a Non-Commerical License Option: CC BY-NC-SA 4.0 #302.

Assumptions made in this PR:

The EMBER DANDI instance will use "EMBER-DANDI" as instance name. (This assumption is used in writing test cases only. So, nothing will break anything if it is wrong.)
EMBER's DOI prefix is to be 10.60533 (and 10.82754 for testing). (This assumption is used in writing test cases only. So, nothing will break anything if it is wrong.)
EMBER adopts DANDI's DOI suffix pattern, which is defined by DANDI_DOI_PATTERN in dandischema.models.py. With ID_PATTERN set to "EMBER" in running the DANDI instance, an example of a suffix would be "ember.001425/0.250514.0602"

Incidental changes:

Renaming test_dantimeta_1() to test_dandimeta_1()
Setting default shell to bash in test.yml to reduce code duplication.
Adding missing ending string anchor to DANDI_DOI_PATTERN
Adding missing ending string anchor to DANDI_PUBID_PATTERN
Renaming _basic_publishmeta() in tests/utils.py to basic_publishmeta()
The "Test against dandi-cli" workflow is only run in an environment vendorized for "DANDI". This is currently necessary since both the dandi-cli and dandi-archive have the "DANDI" vendor hardcoded.

TODOs:

The following type checking errors originate from an issue in the jsonschema package. Make sure the issue is resolved or adopt a temporary solution.

dandischema/utils.py:220: error: Missing positional argument "registry" in call
to "Validator"  [call-arg]
            return validator_cls(schema, format_checker=validator_cls.FORM...
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...
dandischema/utils.py:223: error: Missing positional argument "registry" in call
to "Validator"  [call-arg]
        return validator_cls(schema)
               ^~~~~~~~~~~~~~~~~~~~~

Add description of configuration into README.md

Release Notes

migrate() and validate() are not available at dandischema level. To use them, one must do from dandischema.metadata import migrate, validate instead of from dandischema import migrate, validate.

satra

while this would work (there are several tests failing), how about get them from a config to be specific? also take a look at the jsonld context generator to see which ones map on specifically to dandiarchive. those would also need to be devendored.

yarikoptic · 2025-04-06T15:03:04Z

how about get them from a config to be specific?

what config do you have in mind since so far we have none -- typically we just resort to some env vars (that's what I was thinking we should do in the short run). The point is that it should be "instance" (not runtime) specific which is ATM not supported at all, as e.g. imagine within the same session uploading to both EMBER and DANDI, so then having either validation or ideally entire model per each such different instance type.

satra · 2025-04-06T16:02:43Z

perhaps we are saying the same thing in slightly different ways. dandischema can be devendored and instance agnostic (perhaps and hopefully). anything using it (server, cli, etc.,.) can make an instance of it. i.e. dandiset/asset for an instance of dandi with the right config. so introducing an instance config that can be/required to be passed on to initializing a model.

for example, this could also allow EMBER to have different license considerations than DANDI.

one possibility is perhaps to still consider a generic core model (e.g., union of licenses, more generic doi/devendored setup like this PR) and then an instance specific validation/schema/constraint/version layer that can be customized.

for this PR, let's just simplify(expand) for now. i think some of the context adjustment will also need to happen (here:

dandi-schema/dandischema/metadata.py

Line 46 in 782c421

"dandi": "http://schema.dandiarchive.org/",

)

        "dandi": "http://schema.dandiarchive.org/",
        "dcite": "http://schema.dandiarchive.org/datacite/",
        "dandiasset": "http://dandiarchive.org/asset/",
        "DANDI": "http://dandiarchive.org/dandiset/",

to this (note: the schema ones should not change for now at least as the are simple the model)

        "dandi": "http://schema.dandiarchive.org/",
        "dcite": "http://schema.dandiarchive.org/datacite/",
        "dandiasset": "http://dandiarchive.org/asset/",
        "DANDI": "http://dandiarchive.org/dandiset/",
        "emberasset": "http://emberarchive.org/asset/",
        "EMBER": "http://emberarchive.org/dandiset/",

a full refactor to dandi + instance concepts would require us to make some breaking changes and different identifiers. we could may be craft such a thing this week when we are together.

aaronkanzer · 2025-04-07T16:27:12Z

@yarikoptic -- for LINC, we are currently not issuing DOIs (not using DANDI either)

I have had placeholders in for the Django env vars expected -- I can certainly implement wherever EMBER goes for their own logic for LINC though (let me know if this makes sense @kabilar )

cc @satra

kabilar · 2025-04-07T17:50:04Z

@yarikoptic -- for LINC, we are currently not issuing DOIs (not using DANDI either)

Thanks @yarikoptic. I agree with Aaron. For LINC, we don't use DOIs or DANDI: identifiers, just URLs. And we don't currently have a need for either.

dandischema/models.py

candleindark · 2025-04-25T06:49:53Z

I think there is away to take a config and vendorize the models and enums accordingly even with the technologies we are currently using. Pydantic has a dynamic model creation syntax and Enum has a functional API. However, using these syntaxes/APIs to define all the models and enums (mostly likely within functions) are going to be messy.

dandischema/models.py

codecov · 2025-04-29T19:03:13Z

Codecov Report

❌ Patch coverage is 98.29268% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.89%. Comparing base (8dff37f) to head (33ac781).
⚠️ Report is 99 commits behind head on master.

Files with missing lines	Patch %	Lines
dandischema/tests/conftest.py	87.87%	4 Missing ⚠️
dandischema/tests/test_models.py	93.75%	2 Missing ⚠️
dandischema/conf.py	98.78%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #294      +/-   ##
==========================================
+ Coverage   97.84%   97.89%   +0.04%     
==========================================
  Files          16       18       +2     
  Lines        2042     2370     +328     
==========================================
+ Hits         1998     2320     +322     
- Misses         44       50       +6

Flag	Coverage Δ
unittests	`97.89% <98.29%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dandischema/models.py

candleindark · 2025-04-30T21:04:15Z

@yarikoptic I left a comment at #294 (comment). Otherwise, this PR is good to go. Feel free to squash the commits. There are small and serving as breadcrumbs for review purposes.

One note I want to make is that this PR is not a complete devendorization since we default the nskey to dandi. A form of complete devendorization would not have such a default but require nskey to be set through the configuration.

candleindark · 2025-05-02T06:31:37Z

The two remaining tests failures must be unrelated to the current PR. The last time the two test succeeded was three months ago at https://github.com/dandi/dandi-schema/actions/runs/13537569388. They are worthy of further investigation though.

dandischema/conf.py

yarikoptic · 2025-05-07T19:43:49Z

Let's add testing for 3 scenarios in CI.

"generic" - should handle records for any vendor
"dandi" - settings specific to DANDI
"ember" - settings specific to EMBER (different prefix - EMBER, and different DOI number, whatever it is)

records we have which are not vendor specific which we have in dandischema/tests/data/metadata should pass for all scenarios, and if they are specific to DANDI - only for generic and DANDI; and when we add some for EMBER - only for generic and EMBER.

For now let's just do

what @candleindark says he would do
merge this PR

and then we will reconsider...

dandischema/models.py

yarikoptic · 2025-05-14T18:41:56Z

here you could grab a sample from EMBER https://api-dandi.emberarchive.org/api/dandisets/000004/versions/draft/ but would still need to replace with EMBER: prefix since not vendorized version was used.

also think on how to do migration now within EMBER to adjust all those IDs etc...

dandischema/models.py

…nv file

…-name Workaround to allow `dandischema.conf.Config` to be initialized with field names

These vars, with the exception of `DJANGO_DANDI_DOI_PUBLISH`, need to be provided together or not at all per https://github.com/dandi/dandi-archive/blob/b7288ec920b0c2cf2efd9c6d43e4edbf5016885d/dandiapi/api/checks.py#L10-L25. Since LINC doesn't publish Dandiset with DOI, it is reasonable not to set these env var. Additionally, the value for `DJANGO_DANDI_DOI_API_PREFIX` before this commit doesn't pass the pattern requirement imposed for DOI prefix being introduced in dandi/dandi-schema#294.

Set env vars need to set the vendor-specific dandi-schema introduced by dandi/dandi-schema#294

yarikoptic · 2025-10-15T18:46:19Z

fresh conflict came up again

Monkey-patches a `conf` module into `dandischema` for older version of `dandischema`. This allows dandi-archive to release without a coordinated release of dandi-schema in regard to the vendorization effort implemented in dandi/dandi-schema#294.

Put in a tox env to test vendor-configurable dandischema located at dandi/dandi-schema#294. This env can be dropped once dandi/dandi-schema#294 is merged to master and released

# Conflicts: # dandischema/tests/test_models.py

# Conflicts: # dandischema/models.py

candleindark · 2025-10-30T05:36:00Z

@yarikoptic Merge conflicts resolved again. Please take a final look and merge and release.

yarikoptic · 2025-10-30T15:46:45Z

On a quick look - looks good. I would still though not yet merge/release and aim to finalize/merge

Add ability to downgrade schema to 0.6.10 #342

first with added dandi-cli flexibility in handling versions... so we have minor change to dandi-schema released first before we jump this one . And then (may be not right away to avoid conflicts) here we might want to add

release label
boost version to either 0.7.0 or actually just some next within 0.6.x since it would not be "breaking" change! Could you add a current diff on the schema within some <details>...</details> in the original description on top so we could review it to decide?

yarikoptic · 2025-11-17T22:03:38Z

ok, I am merging this into master and we will release it by

Release new schema 0.7.0 for addition of the releaseNotes #344

yarikoptic added the minor Increment the minor version when merged label Apr 4, 2025

yarikoptic requested review from candleindark and satra April 4, 2025 20:25

satra reviewed Apr 6, 2025

View reviewed changes

yarikoptic mentioned this pull request Apr 7, 2025

Design document for the Zenodo like DOI per dandiset dandi/dandi-archive#2012

Merged

4 tasks

candleindark reviewed Apr 25, 2025

View reviewed changes

dandischema/models.py Outdated Show resolved Hide resolved

candleindark reviewed Apr 25, 2025

View reviewed changes

dandischema/models.py Outdated Show resolved Hide resolved

yarikoptic commented Apr 25, 2025

View reviewed changes

dandischema/models.py Show resolved Hide resolved

yarikoptic commented Apr 25, 2025

View reviewed changes

dandischema/models.py Outdated Show resolved Hide resolved

candleindark reviewed Apr 30, 2025

View reviewed changes

dandischema/models.py Outdated Show resolved Hide resolved

candleindark force-pushed the devendorize branch 2 times, most recently from c3c71f4 to 606e1b6 Compare April 30, 2025 21:47

yarikoptic commented May 2, 2025

View reviewed changes

dandischema/conf.py Outdated Show resolved Hide resolved

yarikoptic commented May 2, 2025

View reviewed changes

dandischema/conf.py Outdated Show resolved Hide resolved

candleindark force-pushed the devendorize branch 2 times, most recently from 6f12bca to 6832dab Compare May 7, 2025 01:05

yarikoptic mentioned this pull request May 7, 2025

Mega-issue: Vendorization of the dandi-schema and other components #299

Open

25 tasks

candleindark mentioned this pull request May 9, 2025

enh: allow creation of dandiset dois (contrasted to a version doi) #297

Draft

asmacdo reviewed May 9, 2025

View reviewed changes

dandischema/models.py Show resolved Hide resolved

yarikoptic commented May 14, 2025

View reviewed changes

dandischema/models.py Show resolved Hide resolved

candleindark and others added 6 commits September 23, 2025 13:58

test: update tests so that field names are used to initialize Config

22bf029

test: add "instance_url" key to config dict for testing

f32125a

test: test initializing dandischema.conf.Config kwargs

715a383

test: test init dandischema.conf.Config by field names through dote…

8fe9701

…nv file

test: test round trip of dandischema.conf.Config

7eafc38

Merge pull request #336 from dandi/devendorize-init-config-with-field…

5406628

…-name Workaround to allow `dandischema.conf.Config` to be initialized with field names

candleindark changed the title ~~Initial Multi-DANDI instance support: import time - generic regexes by default + instance config~~ Vendor-Configurable Metadata Models Sep 27, 2025

candleindark added a commit to candleindark/dandi-infrastructure that referenced this pull request Sep 30, 2025

feat: set env vars to set vendor-specific dandi-schema in production

73fc1df

Set env vars need to set the vendor-specific dandi-schema introduced by dandi/dandi-schema#294

candleindark added a commit to candleindark/dandi-infrastructure that referenced this pull request Sep 30, 2025

feat: set env vars to set vendor-specific dandi-schema in staging

49bd4b1

Set env vars need to set the vendor-specific dandi-schema introduced by dandi/dandi-schema#294

candleindark mentioned this pull request Oct 1, 2025

Set environment vars to set vendor-specific dandi-schema used by the DANDI instance aplbrain/dandi-infrastructure#47

Merged

1 task

candleindark mentioned this pull request Oct 13, 2025

request: rmv windows-2019 test requirements #338

Closed

yarikoptic mentioned this pull request Oct 16, 2025

Add vendorization support dandi/dandi-archive#2584

Merged

20 tasks

candleindark added 2 commits October 20, 2025 13:44

Merge branch 'master' into devendorize

24f800e

# Conflicts: # dandischema/tests/test_models.py

Merge branch 'master' into devendorize

33ac781

# Conflicts: # dandischema/models.py

yarikoptic mentioned this pull request Nov 17, 2025

Release new schema 0.7.0 for addition of the releaseNotes #344

Merged

yarikoptic merged commit 8087312 into master Nov 17, 2025
70 checks passed

yarikoptic deleted the devendorize branch November 17, 2025 22:03

github-project-automation bot moved this from In Progress to Done in Multi-DANDI instance support Nov 17, 2025

candleindark mentioned this pull request Nov 17, 2025

Set dandischema dependency to 0.12.0 and beyond dandi/dandi-cli#1744

Merged

2 tasks

candleindark mentioned this pull request Dec 1, 2025

Vendorize dandi-cli per config in server info dandi/dandi-cli#1661

Draft

4 tasks

Vendor-Configurable Metadata Models #294

Vendor-Configurable Metadata Models #294

Uh oh!

Conversation

yarikoptic commented Apr 4, 2025 • edited by candleindark Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

Uh oh!

satra left a comment

Choose a reason for hiding this comment

Uh oh!

yarikoptic commented Apr 6, 2025

Uh oh!

satra commented Apr 6, 2025

Uh oh!

aaronkanzer commented Apr 7, 2025

Uh oh!

kabilar commented Apr 7, 2025

Uh oh!

Uh oh!

candleindark commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

candleindark commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

candleindark commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!

yarikoptic commented May 7, 2025

Uh oh!

Uh oh!

yarikoptic commented May 14, 2025

Uh oh!

Uh oh!

yarikoptic commented Oct 15, 2025

Uh oh!

candleindark commented Oct 30, 2025

Uh oh!

yarikoptic commented Oct 30, 2025

Uh oh!

yarikoptic commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

yarikoptic commented Apr 4, 2025 •

edited by candleindark

Loading

codecov bot commented Apr 29, 2025 •

edited

Loading

candleindark commented Apr 30, 2025 •

edited

Loading

yarikoptic commented Nov 17, 2025 •

edited

Loading