Metadata Game Changers
  • Home
  • Offerings Capabilities Our Team Contact
  • Software
  • Metadata Game
  • Blog
Metadata Game Changers
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/
dataCiteModel_V6.jpg
Metadata Game Changers

Blog

Exploring metadata, communities, and new ideas.

Metadata Game Changers
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/
December 02, 2021

Connecting DMPs and Research Outputs in DataCite Metadata

December 02, 2021/ Ted Habermann

Cite this blog as Habermann, T. (2021). Connecting DMPs and Research Outputs in DataCite Metadata. Front Matter. https://doi.org/10.59350/ps9jc-vj870

One of the exciting new capabilities in Version 4.4 of the DataCite Metadata Schema is the capability to create unique identifiers (DOIs) for Output Management Plans, i.e., resourceTypeGeneral = OutputManagementPlan. This new resource type recognizes that research outputs can include many kinds of resources beyond just data: journal articles, software, computational notebooks, presentations, etc. and that research plans can document all these types. Thus, OutputManagementPlan instead of DataManagementPlan. Of course, many of us have the acronym DMP embedded in our vocabularies, so keep in mind that the acronym now has broader meaning!

The introduction of this new capability immediately raises questions about how it can be used. One part of the answer is clearly: as an identifier to connect all research outputs from a project. This leads quickly to: what kinds of connections? and what kinds of outputs? We are early in this process, so it is a good time to discuss these questions and come to consensus on common practices for connections in Output Management Plans in DataCite. These practices build on the Common Standard for Machine Actionable Data Management Plans being developed in the Research Data Alliance which focuses on other types of OMP content.

Two repositories have taken advantage of this capability since the introduction of Version 4.4 of the schema during March, 2021: Zenodo (cern.zenodo) has 385 plans and the California Digital Library (cdl.cdl) has 269 (as of November 15, 2021). Together these plans have over 1300 relatedIdentifiers.

RelationTypes

Table 1 shows the number of occurrences of relationTypes used in DataCite metadata by CDL and Zenodo in these plans.

relationType Count
CDL (269) Zenodo (385) Total (654)
IsMetadataFor 269 269
IsPartOf 225 225
HasVersion 219 219
IsVersionOf 218 218
IsReferencedBy 194 4 198
Documents 85 85
IsSupplementTo 16 27 43
References 12 12
IsSupplementedBy 8 8
IsNewVersionOf 8 8
CitedBy 7 7
IsDerivedFrom 7 7
IsIdenticalTo 5 5
IsDocumentedBy 4 4
Reviews 2 2
Continues 2 2
Total 479 834 1313

Table 1. Number of CDL and Zenodo records that include relationTypes.

The most obvious observation from this usage is that organizations creating the metadata for these DMPs have made very different choices about how connections should be documented using the DataCite schema. Looking at the metadata and related resources in more detail can help understand these choices.

Figure 1. Connections from Zenodo and California output management plans.

The most common relationType is IsMetadataFor which is included once in every CDL DMP (269 times) with a URL that generally resolves to the CDL data management plan tool with a local identifier. In words: the resource being described in DataCite is metadata for the complete plan in the CDL system. For example, the plan with the DOI 10.48321.d1s88w IsMetadataFor a PDF exported from the dmptool with a local identifier of 58508. This plan also has a landing page with much of the same information, including the link to the plan in the system, in a different format.

The second most common relationType, IsPartOf, is used 225 times in Zenodo metadata to associate plans with Zenodo communities that bring together resources related to the OutputManagementPlan. For example, the Technology Assessment of Next Generation Sequencing in Personalized Oncology (TANGO) Project created a community that includes sixteen resources related to the project: 7 publications, 5 presentations, 2 datasets, 1 poster and 1 software.

Two relationTypes are used by Zenodo to manage versions of data management plans, HasVersion and IsVersionOf. These connections enable Zenodo’s DOI Versioning which makes it possible to track changes in the plans (and other Zenodo DOIs) across versions. These connections are made automatically for all Zenodo DOIs.       

The next three relationTypes: IsReferencedBy, Documents, and IsSupplementTo are used to connect plans with specific outputs of various types. It is interesting to note that 15 of the 16 IsSupplementTo relations in the CDL repository are in a single record whereas the other two relationTypes occur across many records, reflecting a general practice. These IsSupplementTo relations are mostly related to multiple versions of the same DOI, similar to the IsVersionOf relation in the Zenodo metadata.

The usage of these relationTypes brings up questions about the intended meanings of these relationship types in DataCite metadata. The definitions from the DataCite schema documentation are shown in Table 2. Of these, “documents” seems to be the most general and, as such, seems to capture the range of relationships that might exist between resources created by a project and the project OutputManagementPlan. This is consistent with definitions of documentation and metadata proposed by Habermann, 2018 with metadata being the structured and standard part of the documentation.

relationType Relationship definition
references used as a source of information for
documents is documentation about
supplements is a supplement to
cites includes in a citation
metadata indicates additional metadata

Table 2. Relationship definitions from DataCite metadata documentation.

IdentifierTypes

DataCite relatedIdentifiers must include a type of the related identifier from a list of possible types (ARK , arXiv, bibcode, DOI, EAN13, EISSN, Handle, IGSN, ISBN, ISSN, ISTC, LISSN, LSID, PMID, PURL, UPC, URL, URN, w3id). Table 3 shows the identifier types currently used in CDL and Zenodo and how they are used. CDL defines all types as “URL” even if they have the form of DOIs (e.g. https://doi.org/10.1016/j.exger.2021.111475). Zenodo generally uses URLs for referencing communities and DOIs for referencing versions of resources.

Repository IdentifierType Count
cdl.cdl (all) URL 479
cern.zenodo (isVersionOf, hasVersion, …) DOI 596
cern.zenodo (isPartOf, …) URL 244

Table 3. RelatedIdentifierTypes in output management plans.

Connected Resource Types 

The types of the related resources are given using the relatedIdentifier.resourceTypeGeneral element of the DataCite metadata. These types are helpful for users as DMPs can point to so many different types of research objects. Table 3 shows the counts of these types in the Zenodo metadata. The CDL metadata does not include this element.

Related Item Type Count Related Item Type Count
Software 47 JournalArticle 9
OutputManagementPlan 38 Image 4
Dataset 34 Preprint 3
Text 24 Report 2

Table 4. Types of resources connected to OutputManagementPlans in Zenodo metadata.

People and Organizations

Another important role of the OutputManagementPlan is to connect people and organizations, including funders, to projects. DataCite metadata includes contributorType and funderReferences to identify contributions to a project made by people and organizations. Table 4 shows that CDL credits a broader range of contributorTypes than Zenodo, who includes only funders.

Contributor Type cdl.cdl cern.zenodo Total
ProjectMember 140 140
ProjectManager 225 225
Producer 269 269
DataCurator 288 288
Resource Funder 285 108 393
ProjectLeader 647 647
Total 1854 108 1962

Table 4. Contributor types included in the OutputManagementPlan metadata.

A Way Forward?

CDL and Zenodo have made different choices in developing their output management plan metadata (Figure 1). This diversity reflects the fact that the DataCite metadata schema was not created with data management plans in mind and that many element definitions include significant wiggle room. Comparing these approaches helps us identify features that others can take advantage of as we move forward. As mentioned earlier, DOIs for Output Management Plans are a new capability so now is the time to develop consensus for how they are used and connected to other resources and people. In reality, many of these DMPs were created by tools like the DMPTool and Argos which facilitates creation of consistent DMPs in both systems. Some initial recommendations are made here.

Recommendation 1: Use Documents and IsDocumentedBy for connections between outputs and OutputManagementPlans.

Connecting outputs to a central hub is perhaps the most important role of output management plans. Choices for making these connections are related to the definitions listed in Table 2 which are clearly open to interpretation. Documentation may be described as all material required to make scientific results understandable and reproducible while metadata are the structured and standard subset of that material. In this sense, the output management plan is part of the documentation that provides context for the outputs and Documents and IsDocumentedBy should be used as these relationTypes.

Recommendation 2: Include resourceTypeGeneral for all related resources.

DMPs (and other research objects) can connect to many kinds of research items and knowing the types of connected resources is very helpful for users. All relatedIdentifiers in DMPs should include the resourceTypeGeneral element included in the relatedIdentifier section of the metadata for connected resources.

Recommendation 3: Include ContributorTypes for all contributors (individuals and organizations).

People and organizations contribute to projects in multiple roles and OutputManagementPlans provide an opportunity for recognizing those contributions. In the DataCite metadata schema the contributorType should be used in all cases to identify these roles.

Recommendation 4: Acknowledge in-kind contributions with the “sponsor” contributor type.

Many individuals and organizations contribute to the success of a project in non-financial ways. The contributor type “sponsor” describes contributors that “provide in-kind support, through donation, provision of people or a facility or instrumentation necessary for the development of the resource”. Use the “sponsor” contributorType for these in-kind contributions to distinguish them from financial support (funders). As an example, “sponsor” could be used to acknowledge a research facility that provides unique instrumentation. It can also be used for recognizing field stations that support research in some regions. Other in-kind contributions should also be acknowledged this way.

 Recommendation 4: Use IsPartOf to connect plans to communities.

Using Zenodo communities to group resources related to research projects is an application of this capability that can be helpful for projects that are using Zenodo as a repository for all related research objects. Almost 60% of the DMPs in Zenodo are taking advantage of this capability.

Conclusion

We are in the early days of documenting output management plans using DataCite metadata so it is an important time to identify and adopt common practices that help realize the exciting goals of machine-actionable DMPs: “to facilitate a research data management (RDM) system that allows data and information about research to be communicated and shared across stakeholders, linking metadata, repositories, and institutions, and allowing for notifications and verification, real-time reporting, and automated compliance”. Common practices will facilitate the creation of interoperable DMPs and the development of tools across the research community that help us all reap the on-going benefits of these plans throughout the research life-cycle.

December 02, 2021/ Ted Habermann/ Comment
DataCite Metadata
DataCite, resourceTypes, DMP

Ted Habermann

  • The Others
  • Metadata Game Changers, Stanford ...
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/

Metadata Game Changers

I have worked in scientific data management for many years and enjoy working with organizations and communities that share data and knowledge. I am fluent in metadata standards and dialects used in scientific data management and publishing.

Tell us what you think!

We are constantly working to help you change your metadata game. If you have any questions, suggestions, or crazy ideas, please send contact us or connect with us through the details below.

Ted Habermann
ted@metadatagamechangers.com
ORCID | LinkedIn | Twitter

Erin Robinson
erin@metadatagamechangers.com
ORCID | LinkedIn | Twitter

or use this form.

Search the site:

Subscribe

Sign up with your email address to receive news and updates.

We respect your privacy.

Thank you!

Powered by Squarespace.