How Many When (Update)

Cite this blog as Habermann, T. (2023). How Many When (Update). Front Matter. https://doi.org/10.59350/w1em6-nn888

It is natural to think about DataCite as a repository for identifiers that can be used to unambiguously identify and cite datasets, as that was the original DataCite mission. Last year, I took a look at the history of resourceTypes in DataCite and was surprised (pleasantly) by the number of Text resourceTypes. That data demonstrated the evolution of DataCite over 19 years and the diversity of resourceTypes now included in the repository.

This diversity makes it possible to use DataCite for making connections between all kinds of things. We took advantage of this capability to explore using DataCite and the DataCite Commons to connect many kinds of resources to projects. We are talking about that work this week with a poster (shown below) at the ESIP meeting.

As part of my preparation for this meeting I updated the resourceType history and I was surprised again. The update showed that 75% of the resources added to DataCite so far in 2023 are physicalObjects, mostly scientific samples added to DataCite as part of merging International Generic Sample Numbers (IGSNs) into the DataCite infrastructure. Adding these samples into this infrastructure is a big step forward for connecting them into the global research infrastructure.

DISTRIBUTION OF DATACITE RESOURCETYPES / YEAR. THE FIVE MOST COMMON TYPES ARE LABELED (PPRNT = PREPRINT).

We can use the DataCite Facet Tool to see where these new resources are coming from. The command

retrieveDataCiteFacets -fl clients -il PhysicalObject --facetdata

retrieves the top 10 repositories with PhysicalObjects (Table 1). The two repositories marked with * have been registering physicalObjects for many years. The others are recent additions and all are related to samples.

RepositoryCount
Legacy IGSN ID re-registration for Geoscience Australia SAMPLES data 4418575
SESAR_ODP 3517973
International Treaty on Plant Genetic Resources for Food and Agriculture*1313208
System for Earth Sample Registration (SESAR) 977206
SESAR_Smithsonian 383835
Genebank Information System of the IPK Gatersleben*209285
IODP Bremen Core Repository 128735
Data INRAE 68953
IFREMER Sample Repository 41698
IGSN ID registration for Geoscience SAMPLES data 40895

Table 1. Ten repositories with the most physicalObjects.


Connecting Place-based Research Back to the Place: Project Metadata in DataCite (https://doi.org/10.6084/m9.figshare.23671917.v1).