Ted Habermann, Metadata Game Changers

Cite this blog as: Habermann, T. (2026). Metadata Details for Research Integrity. Front Matter. https://doi.org/10.59350/4eya7-q4r84

Introduction

A recent guide to using Crossref and DataCite metadata to support scientific integrity (Amdekar et al. 2026) provided some great perspectives and insights into how metadata can serve the important goals of helping researchers document and users understand and evaluate research integrity across the scholarly landscape. Authors from Crossref and DataCite described many useful metadata elements and emphasized the critical role of communities in creating and enriching metadata for supporting integrity and other use cases. They also issued a call for actions that included learning more about metadata and enriching your metadata records.

Metadata creators and curators that want to respond to this call have many resources, including extensive helpful documentation from both Crossref and DataCite. I rely on that documentation and great support from both organizations as I try to understand how metadata are being used around the world and to identify organizations that are leading efforts to improve metadata by providing great examples. In addition, the authors mention existing metadata as a “rich, persistent resource for insights” and welcome “scrutiny by the metascience community”.

In this blog I begin to delve a bit deeper into using DataCite metadata to support understanding scientific integrity by explore two questions: 1) what “integrity capabilities” are being used and 2) who is using them. The goal is to explicitly identify some capabilities and to identify potentially helpful examples.

Metadata Detail

Since the emergence of the World Wide Web, the focus of many metadata efforts has been on elements that support data discovery. The title for the DataCite schema documentation has included the words “for the Publication and Citation of Research Data” since the original documentation from January 2011. Of course, understanding scientific integrity depends on details about what was done, who did it, what was it done to, and how was it done.

In metadata land, one might say that “the details are in the details”. Since 2011, the DataCite schema has continually been extended to provide details beyond the discovery use case. The community needs to understand and take advantage of these evolving capabilities if we are going to support documenting scientific integrity (Habermann, 2020). This means adding more content and more details to metadata records. In this blog we explore how the descriptions metadata element is used.

Descriptions

Amdekar et al. point out that an Abstract is critical metadata for identifying the crux of a work and judging relevance to a user’s needs. In DataCite, the abstract is in the description element along with a required descriptionType that can have one of six values (Abstract, Methods, TechnicalInfo, SeriesInformation, TableOfContents, and Other). Methods and TechnicalInfo facilitate addressing the “what was done” and “how was it done” questions as well as for supporting the always elusive goal of reproducibility. Like Abstract, they are critical for supporting reuse and scientific integrity.

The first question, what integrity capabilities are being used, can be answered by counting the number of occurrences of each type of description. Table 1 shows the percentage of DataCite records that include descriptions of all types. Abstract is the most common description type, occurring in 67% of 126+ million DataCite records, with TechnicalInfo and Other occurring in 10% of the records, and Methods occurring in less than 1%.

Type

Records %

Max Repository %

Max Repository

Abstract

67%

47%

National Institute for Fusion Science, NIFS (40025414)

TechnicalInfo

10%

97%

Distributed System of Scientific Collections, DiSSCo (12405751)

Other

10%

27%

Zenodo (3200524)

SeriesInformation

University Library Heidelberg (113767)

Methods

39%

TERN IGSN ID Catalogue (137606)

TableOfContents

937%

European Society of Radiology (55107)

Table 1. The percentage of DataCite records with given description types, the repository with the most occurrences and the % of records with the type in the maximum repository. Data collected May 13, 2026.

The second question, who is using them (the most), can be explored using DataCite facet data that show the ten repositories with the most occurrences of a given descriptionType. Table 1 includes the top repositories, the number of occurrences they include, in parentheses, and the percent of all occurrences in that repository. For instance, the National Institute for Fusion Science (NIFS) metadata includes 40,025,414 abstracts which is 47% of the total number of abstracts. NIFS is currently the largest repository in DataCite with over 40 million records and their metadata is very complete (they are a bright spot) and consistent, so, if they include an element, they include many occurrences. Without this one repository, only 33% of DataCite records include abstracts.

NIFS metadata describes results of fusion experiments which output many datasets, so the abstracts in these metadata are not unique. In a sample of 10,000 records, the abstract “Resistive bolometer measures total value and radial profile of plasma radiation power” occurred 9,996 times. Note that, even though the abstract is short, it provides critical re-use and integrity information, i.e. it helps answer the “how was it done” question by including the kind of instrument used in the experiment and what was measured. This could also be done with more detail using the instrument resourceTypeGeneral with a ‘collectedBy” relation type (Habermann, 2024).

TechnicalInfo descriptions are much less common than Abstracts, occurring in 10% of the records, but they tell a similar story. In this case, a single repository, the Distributed System of Scientific Collections, provides 97% of the technicalInfo descriptions in DataCite. DiSSCo is the second largest repository in DataCite with 12+ million records describing digital specimens for physical specimens mostly hosted in the Naturalis Biodiversity Center at the time these metadata were collected. Over 7,000 of the descriptions in a sample of 10,000 described this relationship in TechnicalInfo as “Digital Specimen for the physical specimen hosted at Naturalis Biodiversity Center”. Other descriptions included local identifiers for the physical specimens.

It is interesting to note that the “Other” descriptionType also occurs in 10% of the records. This type is used when none of the other types work. The large number of occurrences of “Other” suggests that the DataCite Metadata Working Group needs to explore these metadata to see if new terms should be added to the descriptionType vocabulary!

Method descriptions are very rare, occurring in less than 1% of the DataCite records and the TERN IGSN Catalog includes more than any other repositories. In this case, the methods descriptions are commonly links to websites or to manuals like the AusPlots Rangelands Survey Protocols Manual. This is a great example of technicalInfo providing a link to details about protocols that were used to collect these data, supporting re-use and scientific integrity. This could also be augmented with a relatedIdentifier pointing to the protocol with a “documentedBy” relationType.

Conclusions

The DataCite metadata schema has evolved considerably since 2011 and now includes a variety of metadata elements that can support use cases that go significantly beyond the identification and citation that drive the core mandatory metadata elements. Many of these recommended or optional elements are included by Amdekar et al. 2026 because they help contribute to understanding research integrity.

Like descriptions, many of these elements, e.g. relatedIdentifiers and contributors, include type elements (relationType and contributorType) that can indicate specifics about the roles of those elements in describing datasets and supporting integrity. These connections have been described generally by El-Gebali 2024 while the “who and how” of common relation types was discussed by Habermann 2025.

Transitioning from a world where metadata focus on discovery to one where they support research integrity is a significant challenge. Many DataCite repositories have set out on the road towards that goal and can provide examples and lessons learned for the rest of us. This blog scratches the surface of the descriptions element that can be used for abstracts, technicalInfo, and methods. If you are using these elements in your metadata, add a comment so others can learn from your work.

References

Amdekar, M., Chen, X., Cousijn, H., El-Gebali, S., Feeney, P., Hendricks, G., & Stathis, K. (2026). Why metadata matters for research integrity and how to contribute. Zenodo. https://doi.org/10.5281/zenodo.19695957

El-Gebali, S. (2024). Connecting the Dots with DataCite DOI Metadata. DataCite. https://doi.org/10.5438/K81T-ZQ43

Habermann, T. (2020). Minimum Metadata. Front Matter. https://doi.org/10.59350/kanrj-qt678

Habermann, T. (2024). Instruments@DataCite. Front Matter. https://doi.org/10.59350/8fjx7-y3229

Habermann, T. (2025). Connecting PIDs in DataCite Metadata, Front Matter. https://doi.org/10.59350/6s7dv-5ds58.

Blog

Metadata Details for Research Integrity