Instruments@DataCite
/Ted Habermann, Metadata Game Changers
Cite this blog as Habermann, T. (2024). Instruments@DataCite. Front Matter. https://doi.org/10.59350/8fjx7-y3229.
Version 4.5 of the DataCite Metadata Schema, released during January 2024, includes several changes supporting the identification and description of instruments. These changes were made in concert with the RDA PIDINST Working Group as a step towards implementing their recommended instrument metadata content in DataCite. Several DataCite members were describing instruments in DataCite metadata before this capability was introduced and others are beginning to do it now. These existing efforts can inform the development of community conventions and help the broader community understand how to use instrument metadata effectively. This blog post explores current usage as initial input to that process.
DataCite Schema Changes
The DataCite metadata schema includes a shared vocabulary (resourceTypeGeneral) that defines thirty resource types that can be described in DataCite metadata (Habermann, 2023). Version 4.5 of the DataCite metadata schema added the term “Instrument” to this list, allowing unambiguous identification of instruments and related metadata.
Several organizations were using DataCite to describe instruments prior to these changes without a standard mechanism for identifying the described resources as instruments. In some cases, they combined resourceTypeGeneral = PhysicalObject or Other with the free-text field resourceType = “Instrument” to identify these records. In others, the resourceType included longer descriptions for instruments that can be difficult to discover unambiguously. The resourceTypeGeneral = instrument solves this discovery problem (Figure 1).
The resourceTypeGeneral vocabulary is also used to define types of resources connected using related identifiers. Adding Instrument to this vocabulary makes it possible to make well-defined connections between instruments, datasets, and other resources. These connections also include relationTypes from the relationType vocabulary. The terms IsCollectedBy and Collects were added to this vocabulary specifically for use with instruments (Figure 1).
Early Adopters
Early adopters of the resourceTypeGeneral = Instrument currently have 78 records (Table 1). Several of these repositories participated in the early PIDINST discussions and several have joined the effort since DataCite included this resourceTypeGeneral.
Repository ID
Name
Count
tib.hzb
Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
psu.dmr-first
2D Crystal Consortium (2DCC) - Division of Materials Research (DMR) - FIRST
todn.hcsvci
Technical University of Denmark - Energy Innovation Systems
pawsey.repo
Pawsey Supercomputing Centre
cos.osf
Open Science Framework
pqip.devices
Helmholtz-Zentrum Dresden-Rossendorf e.V. -DEVICES
upenn.repo
Univ. of Pennsylvania Repository
tib.iow
Leibniz-Institut fuer Ostseeforschung Warnemuende
uq.repo
The University of Queensland
Table 1. Early adopters of resourceTypeGeneral = Instrument.
Content
The content of the instrument metadata records from these repositories is shown in Figure 2. The required elements (orange) support identification and citation, the primary use cases of the DataCite schema. There are 78 records, so all required elements occur at least 78 times.
Resource Identifiers
There are several mandatory fields that occur more than 78 times, indicating that there are records that include multiple values for these elements. One of these is Resource Title which reflects the inclusion of titles in multiple languages or acronyms as AlternativeTitles. Another is ResourceIdentifier which reflects the existence of multiple identifiers with different types for some instruments. The most common additional identifier is “serial number” which occurs 16 times.
Descriptions
The DataCite metadata schema includes several metadata elements, shown as light green in Figure 2, that provide descriptive metadata across a spectrum of detail (Figure 3). In the case of instruments, the identifier (DOI) provides the ability to connect to/from the specific instrument and resourceTypeGeneral is a very general type (Instrument). ResourceType provides more specific information as free-text which can be used in many ways. The next level of detail is the instrument title, i.e. the name of the specific instrument with the acronym included as an alternateTitle (if available or commonly used). The most detailed information in the metadata record is provided by descriptions with types Abstract or TechnicalInfo. Forty-one (52%) of the existing instrument records include all four of these descriptive elements.
The instrument metadata from these repositories include some elements that are unexpected for physical instruments, e.g. Resource Format with values like “text/csv” and Resource Size with values like “1.3MB”. Exploring these descriptive elements shows that two of the repositories in Table 1 are using the type “Instrument” to identify things like surveys, questionnaires, interviews, tests, checklists, or observation forms. This usage is consistent with the DataCite definition of Instrument but more detail is required to clearly identify the semantics of Instrument. It is interesting to note that “StudyRegistration” was added to the resourceTypeGeneral list in V4.5 and that “Project” will be added in the next version. These additions may provide a more appropriate solution in four of the records in Table 2 which currently use these terms as resourceTypes.
resourceType
Title
Project
BENDEP-SRQ-GV
ProjectComponent
RebeL - Codebook/scales manual, dataset, summary chart of RebeL [Skalenhandbuch, Datensatz, Über-blicksgrafik]
ProjectComponent
Data collection form
Project
Methodology to analyse the divergent thinking egg task
Pre-registration
Digital Interface Patterning for Detecting Illegitimate Publications (DIP-DIP) scale
Pre-registration
Mental Health & Illness Education in Paramedicine: A Scoping Review
template
Data Dictionary Blank Template
Assessment Check-list/p>
FAIR Assessment Checklist for Data Repositories
Table 2. resourceTypes and titles from records describing research instruments rather than physical instruments.
Keywords are another important element for describing instruments and have the advantage that they can be selected from specialized instrument keyword vocabularies or ontologies. The current metadata includes many keywords (more than one / record), but keyword vocabularies remain rare as of yet. The most common one in these metadata is The Bepress Digital Commons Three-Tiered Taxonomy now part of the Elsevier Digital Commons.
Connections
Connections to and from instrument metadata are critical for integrating instruments into the broader research infrastructure and for providing context for understanding the instruments and, equally important, how they are used. The DataCite metadata schema provides several ways to connect instruments to people, organizations, funders, and other resources, also shown in Figure 3.
Funder References
The introduction of the FunderReference element into Version 4.0 of the DataCite schema during 2016 expanded funder description capabilities to include funderName, funderIdentifier, awardNumber, awardURI and awardTitle. Funder metadata are still rare in these instrument records (Figure 2), with Project Funder and Award Number occurring in 28 records (36%), Award Title occurring in 23 records (29%), and Funder Identifiers / Award URIs occurring in <10% of the records. In some cases, the same Funder is listed with several different names/acronyms and without funder identifiers. This makes it difficult to recognize these funders unambiguously and to ensure consistent acknowledgement.
People and Organizations
DataCite creators are important because they are listed in recommended citations of DataCite resources. Resource Author Identifiers occur for most creators in these metadata. The DataCite schema includes nameTypes of either “personal” or “organizational” which allows identification of organizations that are responsible for these instruments. Most of the creators are organizations, identified by RORs, while individuals are identified by ORCIDs. Resource Author Affiliation Identifiers (RORs) occur in only five records. They are more common as identifiers for contributors with contributorType = “Hosting Institution” which occurs in 65 records.
Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
The Helmholtz-Zentrum Berlin für Materialien und Energie GmbH (HZB) helped lead the RDA effort to develop instrument identifiers and to apply these identifiers to support FAIR data systems. HZB has currently documented twenty-four instruments in DataCite metadata. Together these provide real-world examples of many of the documentation concepts described here and shown in Figure 3.
Figure 4 shows HZB instruments from the BESSY Synchrotron Light Source, identified and connected using the identifiers from the metadata. The orange node in the center of the graph is the organization HZB, identified by a ROR (https://ror.org/02aj13c28). It is connected to nine instruments (blue nodes identified by DOIs) as an author, necessary for HZB to appear in the citation, and as a contributor with contributorType = HostingInstitution.
Instruments that have been used together in particular experiments are grouped into three groups by relatedIdentifiers with relationType = References. One instrument, with DOI = “ni000022” appears without any connections. Three of these instruments, in two experiments, were funded by the organization with Crossref Funder ID = 501100002347 (red). Finally, instruments are linked to articles in journals (green) or proceedings (pink) using relatedIdentifiers with relationType = “IsDescribedBy”.
Figure 4 is useful for seeing relationships and groups, and the identifiers alone make it possible for tools to identify items and relationships unambiguously. They are not as useful for human users that are not intimately familiar with this research environment and the HZB identifiers.
Figure 5 takes a step from identifiers towards more detail (Figure 3) by labeling instruments with resourceTypes from the metadata. HZB has used resourceTypes to enable this level of detail, i.e. light sources and associated experiment stations, as only nine of their twenty-four instruments are associated with BESSY II. The other HZB instruments, not shown here, are associated with the BER II reactor. Each group of connected instruments represent a different configuration of the experiments.
Figure 6 takes one more step along the detail spectrum by using titles to label the instrument nodes. These are specific to each instrument, and, unlike the identifiers, they are human readable. Organization names and article titles are also human readable, so, while Figure 4 provides a machine-readable picture of the BESSY beamline, Figure 6 provides a complimentary human-readable version of the picture.
Conclusion
Several DataCite members are beginning to use existing DataCite metadata elements to take advantage of the resourceTypeGeneral=Instrument capability introduced during January 2024. Currently 78 instrument records have been created by nine organizations.
Like many metadata records in DataCite, these records are focused on the identification and citation use cases with mandatory DataCite fields and less usage of fields that support other use cases, like connectivity of people, organizations, funders, and other research objects. Re-curation of these existing records can increase the metadata completeness along with the return on investment of metadata creation and maintenance.
Existing metadata elements support instrument descriptions at several different levels along a detail spectrum: resourceTypeGeneral -> resourceType -> title -> description -> relatedIdentifiers. Organizations that are just starting along the path to instrument metadata in DataCite can take advantage of examples from the Helmholtz-Zentrum Berlin für Materialien und Energie GmbH that demonstrate this spectrum quite well. These examples may also provide a helpful starting point for dialogs on community conventions for instrument metadata in DataCite.
Join a Community Dialog!
DataCite, in partnership with Metadata Game Changers, is excited to announce a virtual community dialogue session to engage the broader life science and astronomy communities in co-developing metadata enhancements supporting the development and use of persistent identifiers (PIDs) for instruments. This dialogue represents the first phase of a larger project recently launched by DataCite.
We invite metadata creators and users to join us and actively advance infrastructure solutions to identify, describe, discover, and track the impact of instruments across domain communities. A primary focus of the dialogue and the broader initiative, generously funded by the Richard Lounsbery Foundation, is the community collaboration and the co-development of collaboratively designed prototype solutions tailored to the needs of the life sciences and astronomy communities.
To this end, we would like to invite you to register today for this virtual dialogue.
🔬Persistent Identifiers for Instruments Community Dialogue:
Information:
Date: October 1, 2024
Time: 11:00 am -1:00 pm ET
Registration Link & Agenda: https://datacite.org/event/persistent-identifiers-for-instruments-community-dialogue/
Open to instrument metadata creators and users, including researchers, data scientists, librarians and policymakers.
Topics to include: ownership & provenance; series & versioning; replicability & reuse; discoverability; impact measurement
References
Günther, G., Bär, M., Greve, N., Krahl, R., Kubin, M., Mannix, O., Smith, W., Vadilonga, S., & Wilks, R. (2022). FAIR Meets EMIL: Principles in Practice. Proceedings of the 18th International Conference on Accelerator and Large Experimental Physics Control Systems, ICALEPCS2021, China. https://doi.org/10.18429/JACOW-ICALEPCS2021-WEBL05
Habermann, T. (2023). How Many When (Update). Front Matter. https://doi.org/10.59350/w1em6-nn888
Habermann, T. (2024). FAIR Metadata Concepts in DataCite Metadata Schema [Data set], Zenodo. https://doi.org/10.5281/zenodo.12168626
RDA Persistent Identification of Instruments WG, https://www.rd-alliance.org/groups/persistent-identification-instruments-wg/