Measuring data reuse and metadata re-curation in multiple repositories is important for understanding repository behavior and finding examples of different behaviors. Comparing connected article and dataset publication dates may provides useful measures and visualizations of these data and repository behaviors..
First, we need a set of connections for analysis. Connections have three parts: 1) datasets with DOIs, 2) articles with DOIs, and 3) Links. We are getting these data from CHORUS which retrieves them from several elements of the global research infrastructure.
Argo Float Data
The Argo Float Dataset is a very well connected dataset and provides an example that demonstrates connection timiming . The DOI (10.17882/42182) was cited by over 300 articles between 2012 and 2023.
This Figure shows the temporal relationships between the publication dates for the dataset and these articles. The x-axis shows the creation year of the data DOI, 2016 in this case. The y-axis shows the publication year of the articles and the number of articles / year is shown as bubble size and as text beside the bubbles with the y-axis from 2011 to 2023.
The second y-axis, on the right side of the plot, shows the difference between the article and dataset publication years. It is zero at the year the data DOI was created (2016).
Curation
Creation of the dataset and article DOIs during the same year is termed curation, metadata creation while the dataset is being created.
20 papers (6%) that referenced the Argo Data were published during 2016, the same year as the dataset DOI was created, i.e., the publication year difference is 0.
The gallery below shows four repositories where data and articles have the same publication dates. These are termed curators.
Cambridge Crystallographic Data Centre
The Royal Society
Optica Publishing Group
Figshare




this is text
Reuse
Data reuse happens when papers are published after the data, i.e., publishing a paper and connecting it to the data after the data has a DOI.
258 papers (82%) that referenced the Argo Data were published after the DOI was published during 2016, i.e. with positive year differences.
The gallery below shows four repositories with significant data reuse.
NOAA National Centers for Environmental Information
UNAVCO, Inc.
NASA EOSDIS Land Processes Distributed Active Archive Center
UCAR NCAR - Research Data Archive




Re-Curation
Re-curation of the article or dataset metadata, i.e., creating a connection by adding a link to the dataset or article metadata, mostly happens when a dataset is created after an article is published.
37 papers (12%) that referenced the Argo Data were published before 2016, i.e. with negative year differences.
The University of British Columbia
BindingDB
Biological and Chemical Oceanography Data Management Office (BCO-DMO)
Borealis




Other Repositories With Data From Articles Funded by NSF.
The gallery below shows connection timelines for repositories holding datasets that are connected to articles that acknowledge funding by NSF. The articles are selected using this criteria (funder = U.S. National Science Foundation or funderId = 10.13039/100000001) and links to datasets are identified using Scholexplorer. They provide a limited view that may not be representative of the entire repository. Data retrieval and processing are described here.
Arctic Data Center
Biological Magnetic Resonance Bank
CaltechDATA
Centre de Donnees Strasbourg (CDS)
Code Ocean
CyVerse Data Commons
Dryad
Data Repository for the University of Minnesota (DRUM)
Earth System Grid Federation
Environmental Data Initiative
GFZ Data Services
GigaScience Database
Harte Research Institute
Harvard Dataverse
HEPData
KNB Data Repository
NASA Goddard Earth Sciences Data and Information Services Center
SEANOE
NASA National Snow and Ice Data Center Distributed Active Archive Center
National Geophysical Data Center, NOAA
UCAR NCAR - Earth Observing Laboratory
Taylor & Francis
UC San Diego Library Digital Collections
University of Illinois at Urbana-Champaign
University of Michigan
U.S. Geological Survey
U.S. EPA Office of Research and Development (ORD)
Wiley
U.S. Antarctic Program (USAP) Data Center
Zenodo


































