Skip to content

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Apr 4, 2025

to accomodate LINC and EMBER. We are yet to workout proper vendorization where a particular model, ideally not entire run time as we could do via env var, could be parametrized with a specific vendor information. I think that relaxing those regular expressions altogether would help to adopt dandi for custom deployments meanwhile.

@satra -- WDYT for a quick workaround like this? any side effects I am not foreseeing besides "looser" validation?

Also attn @aaronkanzer and @kabilar -- for LINC did you just keep DANDI identifiers?

DONEs:

  • actually may be I should add vendoring via a few env vars, then at least a specific deployment of dandi-archive could restrict more heavily for a specific instance
  • (IMHO very optional) allow for vendorization of the namespace - added new DANDI_NSKEY internal variable
    • add support/testing for that in tests which ATM have "dandi:" hardcoded
  • add test cases to ensure that works for alternative identifiers
  • This PR includes the solution to Add a Non-Commerical License Option: CC BY-NC-SA 4.0 #302 by making supported licenses configurable by vendor. This PR closes Add a Non-Commerical License Option: CC BY-NC-SA 4.0 #302.

Assumptions made in this PR:

  1. The EMBER DANDI instance will use "EMBER-DANDI" as instance name. (This assumption is used in writing test cases only. So, nothing will break anything if it is wrong.)
  2. EMBER's DOI prefix is to be 10.60533 (and 10.82754 for testing). (This assumption is used in writing test cases only. So, nothing will break anything if it is wrong.)
  3. EMBER adopts DANDI's DOI suffix pattern, which is defined by DANDI_DOI_PATTERN in dandischema.models.py. With ID_PATTERN set to "EMBER" in running the DANDI instance, an example of a suffix would be "ember.001425/0.250514.0602"

Incidental changes:

  1. Renaming test_dantimeta_1() to test_dandimeta_1()
  2. Setting default shell to bash in test.yml to reduce code duplication.
  3. Adding missing ending string anchor to DANDI_DOI_PATTERN
  4. Adding missing ending string anchor to DANDI_PUBID_PATTERN
  5. Renaming _basic_publishmeta() in tests/utils.py to basic_publishmeta()
  6. The "Test against dandi-cli" workflow is only run in an environment vendorized for "DANDI". This is currently necessary since both the dandi-cli and dandi-archive have the "DANDI" vendor hardcoded.

TODOs:

  • The following type checking errors originate from an issue in the jsonschema package. Make sure the issue is resolved or adopt a temporary solution.
    dandischema/utils.py:220: error: Missing positional argument "registry" in call
    to "Validator"  [call-arg]
                return validator_cls(schema, format_checker=validator_cls.FORM...
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...
    dandischema/utils.py:223: error: Missing positional argument "registry" in call
    to "Validator"  [call-arg]
            return validator_cls(schema)
                   ^~~~~~~~~~~~~~~~~~~~~
  • Add description of configuration into README.md

Release Notes

migrate() and validate() are not available at dandischema level. To use them, one must do from dandischema.metadata import migrate, validate instead of from dandischema import migrate, validate.

@yarikoptic yarikoptic added the minor Increment the minor version when merged label Apr 4, 2025
@yarikoptic yarikoptic requested review from candleindark and satra April 4, 2025 20:25
Copy link
Member

@satra satra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this would work (there are several tests failing), how about get them from a config to be specific? also take a look at the jsonld context generator to see which ones map on specifically to dandiarchive. those would also need to be devendored.

@yarikoptic
Copy link
Member Author

how about get them from a config to be specific?

what config do you have in mind since so far we have none -- typically we just resort to some env vars (that's what I was thinking we should do in the short run). The point is that it should be "instance" (not runtime) specific which is ATM not supported at all, as e.g. imagine within the same session uploading to both EMBER and DANDI, so then having either validation or ideally entire model per each such different instance type.

@satra
Copy link
Member

satra commented Apr 6, 2025

perhaps we are saying the same thing in slightly different ways. dandischema can be devendored and instance agnostic (perhaps and hopefully). anything using it (server, cli, etc.,.) can make an instance of it. i.e. dandiset/asset for an instance of dandi with the right config. so introducing an instance config that can be/required to be passed on to initializing a model.

for example, this could also allow EMBER to have different license considerations than DANDI.

one possibility is perhaps to still consider a generic core model (e.g., union of licenses, more generic doi/devendored setup like this PR) and then an instance specific validation/schema/constraint/version layer that can be customized.

for this PR, let's just simplify(expand) for now. i think some of the context adjustment will also need to happen (here:

"dandi": "http://schema.dandiarchive.org/",
)

        "dandi": "http://schema.dandiarchive.org/",
        "dcite": "http://schema.dandiarchive.org/datacite/",
        "dandiasset": "http://dandiarchive.org/asset/",
        "DANDI": "http://dandiarchive.org/dandiset/",

to this (note: the schema ones should not change for now at least as the are simple the model)

        "dandi": "http://schema.dandiarchive.org/",
        "dcite": "http://schema.dandiarchive.org/datacite/",
        "dandiasset": "http://dandiarchive.org/asset/",
        "DANDI": "http://dandiarchive.org/dandiset/",
        "emberasset": "http://emberarchive.org/asset/",
        "EMBER": "http://emberarchive.org/dandiset/",

a full refactor to dandi + instance concepts would require us to make some breaking changes and different identifiers. we could may be craft such a thing this week when we are together.

@aaronkanzer
Copy link

@yarikoptic -- for LINC, we are currently not issuing DOIs (not using DANDI either)

I have had placeholders in for the Django env vars expected -- I can certainly implement wherever EMBER goes for their own logic for LINC though (let me know if this makes sense @kabilar )

cc @satra

@kabilar
Copy link
Member

kabilar commented Apr 7, 2025

@yarikoptic -- for LINC, we are currently not issuing DOIs (not using DANDI either)

Thanks @yarikoptic. I agree with Aaron. For LINC, we don't use DOIs or DANDI: identifiers, just URLs. And we don't currently have a need for either.

@candleindark
Copy link
Member

I think there is away to take a config and vendorize the models and enums accordingly even with the technologies we are currently using. Pydantic has a dynamic model creation syntax and Enum has a functional API. However, using these syntaxes/APIs to define all the models and enums (mostly likely within functions) are going to be messy.

@codecov
Copy link

codecov bot commented Apr 29, 2025

Codecov Report

❌ Patch coverage is 98.29268% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.89%. Comparing base (8dff37f) to head (33ac781).
⚠️ Report is 99 commits behind head on master.

Files with missing lines Patch % Lines
dandischema/tests/conftest.py 87.87% 4 Missing ⚠️
dandischema/tests/test_models.py 93.75% 2 Missing ⚠️
dandischema/conf.py 98.78% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #294      +/-   ##
==========================================
+ Coverage   97.84%   97.89%   +0.04%     
==========================================
  Files          16       18       +2     
  Lines        2042     2370     +328     
==========================================
+ Hits         1998     2320     +322     
- Misses         44       50       +6     
Flag Coverage Δ
unittests 97.89% <98.29%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@candleindark
Copy link
Member

candleindark commented Apr 30, 2025

@yarikoptic I left a comment at #294 (comment). Otherwise, this PR is good to go. Feel free to squash the commits. There are small and serving as breadcrumbs for review purposes.

One note I want to make is that this PR is not a complete devendorization since we default the nskey to dandi. A form of complete devendorization would not have such a default but require nskey to be set through the configuration.

@candleindark candleindark force-pushed the devendorize branch 2 times, most recently from c3c71f4 to 606e1b6 Compare April 30, 2025 21:47
@candleindark
Copy link
Member

The two remaining tests failures must be unrelated to the current PR. The last time the two test succeeded was three months ago at https://github.com/dandi/dandi-schema/actions/runs/13537569388. They are worthy of further investigation though.

@yarikoptic
Copy link
Member Author

Let's add testing for 3 scenarios in CI.

  1. "generic" - should handle records for any vendor
  2. "dandi" - settings specific to DANDI
  3. "ember" - settings specific to EMBER (different prefix - EMBER, and different DOI number, whatever it is)

records we have which are not vendor specific which we have in dandischema/tests/data/metadata should pass for all scenarios, and if they are specific to DANDI - only for generic and DANDI; and when we add some for EMBER - only for generic and EMBER.

For now let's just do

and then we will reconsider...

@yarikoptic
Copy link
Member Author

here you could grab a sample from EMBER https://api-dandi.emberarchive.org/api/dandisets/000004/versions/draft/ but would still need to replace with EMBER: prefix since not vendorized version was used.

also think on how to do migration now within EMBER to adjust all those IDs etc...

@candleindark candleindark changed the title Initial Multi-DANDI instance support: import time - generic regexes by default + instance config Vendor-Configurable Metadata Models Sep 27, 2025
candleindark added a commit to candleindark/dandi-infrastructure that referenced this pull request Sep 30, 2025
These vars, with the exception of `DJANGO_DANDI_DOI_PUBLISH`, need to be provided
together or not at all per
https://github.com/dandi/dandi-archive/blob/b7288ec920b0c2cf2efd9c6d43e4edbf5016885d/dandiapi/api/checks.py#L10-L25. Since LINC
doesn't publish Dandiset with DOI, it is
reasonable not to set these env var.
Additionally, the value for
`DJANGO_DANDI_DOI_API_PREFIX` before this
commit doesn't pass the pattern requirement
imposed for DOI prefix being introduced in
dandi/dandi-schema#294.
candleindark added a commit to candleindark/dandi-infrastructure that referenced this pull request Sep 30, 2025
These vars, with the exception of `DJANGO_DANDI_DOI_PUBLISH`, need to be provided
together or not at all per
https://github.com/dandi/dandi-archive/blob/b7288ec920b0c2cf2efd9c6d43e4edbf5016885d/dandiapi/api/checks.py#L10-L25. Since LINC
doesn't publish Dandiset with DOI, it is
reasonable not to set these env var.
Additionally, the value for
`DJANGO_DANDI_DOI_API_PREFIX` before this
commit doesn't pass the pattern requirement
imposed for DOI prefix being introduced in
dandi/dandi-schema#294.
candleindark added a commit to candleindark/dandi-infrastructure that referenced this pull request Sep 30, 2025
Set env vars need to set the vendor-specific dandi-schema
introduced by dandi/dandi-schema#294
candleindark added a commit to candleindark/dandi-infrastructure that referenced this pull request Sep 30, 2025
Set env vars need to set the vendor-specific dandi-schema
introduced by dandi/dandi-schema#294
@yarikoptic
Copy link
Member Author

fresh conflict came up again

candleindark added a commit to dandi/dandi-archive that referenced this pull request Oct 20, 2025
Monkey-patches a `conf` module into `dandischema` for older
version of `dandischema`. This allows dandi-archive to release
without a coordinated release of dandi-schema in regard to
the vendorization effort implemented in
dandi/dandi-schema#294.
candleindark added a commit to dandi/dandi-archive that referenced this pull request Oct 20, 2025
Monkey-patches a `conf` module into `dandischema` for older
version of `dandischema`. This allows dandi-archive to release
without a coordinated release of dandi-schema in regard to
the vendorization effort implemented in
dandi/dandi-schema#294.
candleindark added a commit to dandi/dandi-archive that referenced this pull request Oct 20, 2025
Put in a tox env to test vendor-configurable
 dandischema located at
 dandi/dandi-schema#294.
 This env can be dropped once
 dandi/dandi-schema#294 is
 merged to master and released
candleindark added a commit to dandi/dandi-archive that referenced this pull request Oct 20, 2025
Put in a tox env to test vendor-configurable
 dandischema located at
 dandi/dandi-schema#294.
 This env can be dropped once
 dandi/dandi-schema#294 is
 merged to master and released
# Conflicts:
#	dandischema/tests/test_models.py
# Conflicts:
#	dandischema/models.py
@candleindark
Copy link
Member

@yarikoptic Merge conflicts resolved again. Please take a final look and merge and release.

@yarikoptic
Copy link
Member Author

On a quick look - looks good. I would still though not yet merge/release and aim to finalize/merge

first with added dandi-cli flexibility in handling versions... so we have minor change to dandi-schema released first before we jump this one . And then (may be not right away to avoid conflicts) here we might want to add

  • release label
  • boost version to either 0.7.0 or actually just some next within 0.6.x since it would not be "breaking" change! Could you add a current diff on the schema within some <details>...</details> in the original description on top so we could review it to decide?

@yarikoptic
Copy link
Member Author

yarikoptic commented Nov 17, 2025

ok, I am merging this into master and we will release it by

@yarikoptic yarikoptic merged commit 8087312 into master Nov 17, 2025
70 checks passed
@yarikoptic yarikoptic deleted the devendorize branch November 17, 2025 22:03
@github-project-automation github-project-automation bot moved this from In Progress to Done in Multi-DANDI instance support Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor Increment the minor version when merged

Projects

Development

Successfully merging this pull request may close these issues.

Add a Non-Commerical License Option: CC BY-NC-SA 4.0

10 participants