Skip to content

Add support for OpenAIRE#299

Open
jli2025 wants to merge 4 commits intogeopython:masterfrom
jli2025:openaire
Open

Add support for OpenAIRE#299
jli2025 wants to merge 4 commits intogeopython:masterfrom
jli2025:openaire

Conversation

@jli2025
Copy link

@jli2025 jli2025 commented Dec 5, 2025

This PR adds support for openaire metadata. It only supports import (openaire -> mcf).

It can use used for metadata record from the new openaire api: OpenAIRE Graph API

An example openaire metadata record: https://api.openaire.eu/graph/v2/researchProducts?pid=10.3390/proceedings2019030057

@pvgenuchten
Copy link
Contributor

Thanx @jli2025 i’ll have a look asap

@jli2025
Copy link
Author

jli2025 commented Dec 15, 2025

Hi @pvgenuchten I just added a commit to fix the identifier issue (when there is no pids) as we discussed.

@pvgenuchten
Copy link
Contributor

pvgenuchten commented Jan 12, 2026

Code looks really nice @jli2025, some small comments:

  • I noticed a tag: test at root level of the generated yml, not sure how it was generated?
  • Is it possible to add a /metadata/datestamp element on the record (maybe with date today, if the imported source does not have it), datestamp seems a required element on mcf

Copy link
Contributor

@pvgenuchten pvgenuchten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added minor fixes

@pvgenuchten pvgenuchten force-pushed the openaire branch 2 times, most recently from 06ba553 to 82cc422 Compare January 13, 2026 10:17
@pvgenuchten
Copy link
Contributor

pvgenuchten commented Jan 29, 2026

@jli2025 Can you extend the plugin to make metadata_ = md.get('results')[0] optional
we can not assume the header/results wrapper is always available (although if it is available, good to unwrap the individual results) -> resolved

jli2025 and others added 3 commits January 29, 2026 12:44
initialize

update

last update on the old openaire api format

create mapping for new openaire api

update

update

add new samples

remove sample data

add project

remove run_test script
@pvgenuchten pvgenuchten force-pushed the openaire branch 2 times, most recently from 8514419 to 9c41c48 Compare January 29, 2026 12:07
@pvgenuchten pvgenuchten requested review from pvgenuchten and tomkralidis and removed request for pvgenuchten January 29, 2026 12:09
Copy link
Contributor

@pvgenuchten pvgenuchten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operational now, added a import test

Copy link
Member

@tomkralidis tomkralidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some additional comments/change requests.

# those files. Users are asked to read the 3rd Party Licenses
# referenced with those assets.
#
# Copyright (c) 2025 Tom Kralidis, Jiarong Li, Paul van Genuchten
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright should be one line per person (who touched the code).

#
# =================================================================

import logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetize imports.

return contact_dict


def id2url(scheme: str, id: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id -> id_ (id is a Python keyword)


def id2url(scheme: str, id: str) -> str:
"""
Convert orcid, wikidata, ror or grid value to url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete docstring (add :param id:, and :returns:)


def id2url(scheme: str, id: str) -> str:
"""
Convert orcid, wikidata, ror or grid value to url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    scheme2 = scheme.lower()
    value = None

    if scheme2 in ['ror', 'grid']:
        value = id_
    elif scheme2 == 'orcid':
        return f'https://orcid.org/{id_}'
    elif scheme2 == 'wikidata':
        value = f'https://www.wikidata.org/wiki/{id_}'
    elif scheme2 == 'isni':
        value = f'https://isni.org/isni/{id_}'

    return value

contactpoint_dict['organization'] = org_name
pids = contact.get('pids', [])
if pids is not None:
for p in pids:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                for p in pids:
                    if p.get('scheme').lower() in ['ror', 'grid', 'wikidata', 'isni']:
                        contactpoint_dict['url'] = id2url(
                            p.get('scheme'), p.get('value'))
                        break

if pid is not None and pid.get('id') is not None:
pid_scheme = pid.get('id', {}).get('scheme')
pid_value = pid.get('id', {}).get('value')
if pid_scheme is not None and pid_value is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                if None not in [pid_scheme, pid_value]:


def process_keywords(subjects: list) -> dict:
"""
convert openaire keywords to mcf keywords
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete docstring (add :param subjects:, :returns:)



def process_id_and_instance(
pids: list, originIds: list,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete docstring (add :param:'s, :returns:)

mcf['metadata']['hierarchylevel'] = instance_type_

date_of_collection = metadata_.get('dateOfCollection')
if date_of_collection:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test with if date_of_collection is not None:? Needs to be applied a few places in this function where a .get() was called.

@pvgenuchten
Copy link
Contributor

pvgenuchten commented Jan 29, 2026

actually, maybe we are too friendly here... we should also have a clear case indicating: "This json is not in OpenAire format", else it would parse any json and return an almost empty mcf... in case schema:autodetect is used

actually, the openaire is totally not considered yet when using autodetect, an update is needed elsewhere?
--> this is actually a problem of the schema-org plugin, the schema-org plugin returns empty document when not matching, and then does not continue to next schema to try

- do not require results wrapper
- add import test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants