How Many When (Update)
/Cite this blog as Habermann, T. (2023). How Many When (Update). Front Matter. https://doi.org/10.59350/w1em6-nn888
It is natural to think about DataCite as a repository for identifiers that can be used to unambiguously identify and cite datasets, as that was the original DataCite mission. Last year, I took a look at the history of resourceTypes in DataCite and was surprised (pleasantly) by the number of Text resourceTypes. That data demonstrated the evolution of DataCite over 19 years and the diversity of resourceTypes now included in the repository.
This diversity makes it possible to use DataCite for making connections between all kinds of things. We took advantage of this capability to explore using DataCite and the DataCite Commons to connect many kinds of resources to projects. We are talking about that work this week with a poster (shown below) at the ESIP meeting.
As part of my preparation for this meeting I updated the resourceType history and I was surprised again. The update showed that 75% of the resources added to DataCite so far in 2023 are physicalObjects, mostly scientific samples added to DataCite as part of merging International Generic Sample Numbers (IGSNs) into the DataCite infrastructure. Adding these samples into this infrastructure is a big step forward for connecting them into the global research infrastructure.
We can use the DataCite Facet Tool to see where these new resources are coming from. The command
retrieveDataCiteFacets -fl clients -il PhysicalObject --facetdata
retrieves the top 10 repositories with PhysicalObjects (Table 1). The two repositories marked with * have been registering physicalObjects for many years. The others are recent additions and all are related to samples.
Repository | Count |
Legacy IGSN ID re-registration for Geoscience Australia SAMPLES data | 4418575 |
SESAR_ODP | 3517973 |
International Treaty on Plant Genetic Resources for Food and Agriculture* | 1313208 |
System for Earth Sample Registration (SESAR) | 977206 |
SESAR_Smithsonian | 383835 |
Genebank Information System of the IPK Gatersleben* | 209285 |
IODP Bremen Core Repository | 128735 |
Data INRAE | 68953 |
IFREMER Sample Repository | 41698 |
IGSN ID registration for Geoscience SAMPLES data | 40895 |
Table 1. Ten repositories with the most physicalObjects.