Early Adopters of Organizational Identifiers in DataCite
/Cite this blog as Habermann, T. (2020). Early Adopters of Organizational Identifiers in DataCite. Front Matter. https://doi.org/10.59350/g96gh-x2361
Version 4.3 of the DataCite Metadata Schema released during August, 2019 included (among other things), the capability to provide persistent identifiers for affiliated organizations in the metadata (Dasler and deSmaele, Identify your affiliation with Metadata Schema 4.3, 2019). This capability builds on the work and enthusiasm generated by the ROR Community that has championed the concept of open organization identifiers for several years (Gould, A Reflection on ROR's First Year, 2019). This is a critical step towards consistent integration of organizations into the growing web of connections across the scholarly communications landscape and research communities in all disciplines.
The first organizational identifiers in DataCite metadata were funderIdentifiers introduced in Version 4.0 during late 2016. These identifiers might now be considered mature in DataCite metadata and will not be considered. Here we focus on organizational identifiers introduced last August and are integrated into the DataCite schema as affiliationIdentifiers associated with either creators or contributors. In this blog I identify the early adopters that have already started to build these important connections into their metadata..
Finding Early Adaptors
The DataCite API makes it easy to find repositories that are adopting these new identifiers with two requests and a small bit of python (both functional but certainly could be improved):
Creators: import requests import json # # Query DataCite for records that contain affiliationIdentifiers for creators # URL = """https://api.datacite.org/dois?\ query=creators.affiliation.affiliationIdentifier:*\ &affiliation=true&page%5Bsize%5D=1""" r = requests.get(URL) for i in r.json()['meta']['clients']: print(i['title'], i['id'], i['count']) Contributors: import requests import json # # Query DataCite for records that contain affiliationIdentifiers for contributors # URL = """https://api.datacite.org/dois?\ query=contributors.affiliation.affiliationIdentifier:*\ &affiliation=true&page%5Bsize%5D=1""" r = requests.get(URL) for i in r.json()['meta']['clients']: print(i['title'], i['id'], i['count'])
Note that both of these queries list only the fifteen repositories with the largest number of identifiers.
The results of these queries (as of 2020-02-17), shown in the Table below, indicate that identifiers for creators, in 15+ repositories, are currently more common than identifiers for contributors, in 8 repositories, and that three repositories (bl.imperial, bl.nerc, and tib.wdcc) currently have some identifiers of both types. The counts in this Table are numbers of records with affiliations, not the number of affiliations.
Repository
Creator
Record Count
Contributor
Record Count
dryad.dryad
20129
bl.imperial
13400
5926
heallink.tuc
7848
caltech.library
4809
odu.viva
1092
bl.nerc
956
823
tib.wdcc
502
419
uky.lib
311
psnc.uwr
264
noaa.library
248
uiowa.prod
236
bl.iita
184
184
uutah.usu
175
cisti.statcan
152
datacite.blog
140
bl.cefas
(25)
27
inist.cnrs
3
datacite.datacite
(2)
2
zbmed.zfmk
(1)
1
tib.hbrs
(35)
1
inist.inrap
1
The numbers in this Table allow us to identify the early adopters of organizational identifiers, both DataCite members and repositories. These members of the community are the “guinea pigs” in this effort that are committed enough to the benefits of organizational identifiers in metadata to take action and serve as good examples for the community. All should be recognized as pioneers!
In an earlier blog (Habermann, How Many ROR’s Do We Need?, 2019) I made the somewhat surprising observation that many DataCite repositories contained only a small number of unique affiliations and, therefore, those repositories could uniquely identify all organizations in their metadata with a small number of organizational identifiers. These earlier data are shown in this Figure. It indicates that 108 repositories only need one ROR and that 235 need five or less. This is great news for the adoption of RORs.
The early adopters identified above include examples where many metadata records contain a small number of affiliations – low hanging fruit along the path to adoption. Others face much more difficult tasks because of large numbers of affiliations spread across many records. We look forward to learning lessons as all of these groups forge forward.