Connecting DMPs and Research Outputs in DataCite Metadata
/Cite this blog as Habermann, T. (2021). Connecting DMPs and Research Outputs in DataCite Metadata. Front Matter. https://doi.org/10.59350/ps9jc-vj870
One of the exciting new capabilities in Version 4.4 of the DataCite Metadata Schema is the capability to create unique identifiers (DOIs) for Output Management Plans, i.e., resourceTypeGeneral = OutputManagementPlan. This new resource type recognizes that research outputs can include many kinds of resources beyond just data: journal articles, software, computational notebooks, presentations, etc. and that research plans can document all these types. Thus, OutputManagementPlan instead of DataManagementPlan. Of course, many of us have the acronym DMP embedded in our vocabularies, so keep in mind that the acronym now has broader meaning!
The introduction of this new capability immediately raises questions about how it can be used. One part of the answer is clearly: as an identifier to connect all research outputs from a project. This leads quickly to: what kinds of connections? and what kinds of outputs? We are early in this process, so it is a good time to discuss these questions and come to consensus on common practices for connections in Output Management Plans in DataCite. These practices build on the Common Standard for Machine Actionable Data Management Plans being developed in the Research Data Alliance which focuses on other types of OMP content.
Two repositories have taken advantage of this capability since the introduction of Version 4.4 of the schema during March, 2021: Zenodo (cern.zenodo) has 385 plans and the California Digital Library (cdl.cdl) has 269 (as of November 15, 2021). Together these plans have over 1300 relatedIdentifiers.
RelationTypes
Table 1 shows the number of occurrences of relationTypes used in DataCite metadata by CDL and Zenodo in these plans.
relationType | Count | ||
CDL (269) | Zenodo (385) | Total (654) | |
IsMetadataFor | 269 | 269 | |
IsPartOf | 225 | 225 | |
HasVersion | 219 | 219 | |
IsVersionOf | 218 | 218 | |
IsReferencedBy | 194 | 4 | 198 |
Documents | 85 | 85 | |
IsSupplementTo | 16 | 27 | 43 |
References | 12 | 12 | |
IsSupplementedBy | 8 | 8 | |
IsNewVersionOf | 8 | 8 | |
CitedBy | 7 | 7 | |
IsDerivedFrom | 7 | 7 | |
IsIdenticalTo | 5 | 5 | |
IsDocumentedBy | 4 | 4 | |
Reviews | 2 | 2 | |
Continues | 2 | 2 | |
Total | 479 | 834 | 1313 |
Table 1. Number of CDL and Zenodo records that include relationTypes.
The most obvious observation from this usage is that organizations creating the metadata for these DMPs have made very different choices about how connections should be documented using the DataCite schema. Looking at the metadata and related resources in more detail can help understand these choices.
The most common relationType is IsMetadataFor which is included once in every CDL DMP (269 times) with a URL that generally resolves to the CDL data management plan tool with a local identifier. In words: the resource being described in DataCite is metadata for the complete plan in the CDL system. For example, the plan with the DOI 10.48321.d1s88w IsMetadataFor a PDF exported from the dmptool with a local identifier of 58508. This plan also has a landing page with much of the same information, including the link to the plan in the system, in a different format.
The second most common relationType, IsPartOf, is used 225 times in Zenodo metadata to associate plans with Zenodo communities that bring together resources related to the OutputManagementPlan. For example, the Technology Assessment of Next Generation Sequencing in Personalized Oncology (TANGO) Project created a community that includes sixteen resources related to the project: 7 publications, 5 presentations, 2 datasets, 1 poster and 1 software.
Two relationTypes are used by Zenodo to manage versions of data management plans, HasVersion and IsVersionOf. These connections enable Zenodo’s DOI Versioning which makes it possible to track changes in the plans (and other Zenodo DOIs) across versions. These connections are made automatically for all Zenodo DOIs.
The next three relationTypes: IsReferencedBy, Documents, and IsSupplementTo are used to connect plans with specific outputs of various types. It is interesting to note that 15 of the 16 IsSupplementTo relations in the CDL repository are in a single record whereas the other two relationTypes occur across many records, reflecting a general practice. These IsSupplementTo relations are mostly related to multiple versions of the same DOI, similar to the IsVersionOf relation in the Zenodo metadata.
The usage of these relationTypes brings up questions about the intended meanings of these relationship types in DataCite metadata. The definitions from the DataCite schema documentation are shown in Table 2. Of these, “documents” seems to be the most general and, as such, seems to capture the range of relationships that might exist between resources created by a project and the project OutputManagementPlan. This is consistent with definitions of documentation and metadata proposed by Habermann, 2018 with metadata being the structured and standard part of the documentation.
relationType | Relationship definition |
references | used as a source of information for |
documents | is documentation about |
supplements | is a supplement to |
cites | includes in a citation |
metadata | indicates additional metadata |
Table 2. Relationship definitions from DataCite metadata documentation.
IdentifierTypes
DataCite relatedIdentifiers must include a type of the related identifier from a list of possible types (ARK , arXiv, bibcode, DOI, EAN13, EISSN, Handle, IGSN, ISBN, ISSN, ISTC, LISSN, LSID, PMID, PURL, UPC, URL, URN, w3id). Table 3 shows the identifier types currently used in CDL and Zenodo and how they are used. CDL defines all types as “URL” even if they have the form of DOIs (e.g. https://doi.org/10.1016/j.exger.2021.111475). Zenodo generally uses URLs for referencing communities and DOIs for referencing versions of resources.
Repository | IdentifierType | Count |
cdl.cdl (all) | URL | 479 |
cern.zenodo (isVersionOf, hasVersion, …) | DOI | 596 |
cern.zenodo (isPartOf, …) | URL | 244 |
Table 3. RelatedIdentifierTypes in output management plans.
Connected Resource Types
The types of the related resources are given using the relatedIdentifier.resourceTypeGeneral element of the DataCite metadata. These types are helpful for users as DMPs can point to so many different types of research objects. Table 3 shows the counts of these types in the Zenodo metadata. The CDL metadata does not include this element.
Related Item Type | Count | Related Item Type | Count |
Software | 47 | JournalArticle | 9 |
OutputManagementPlan | 38 | Image | 4 |
Dataset | 34 | Preprint | 3 |
Text | 24 | Report | 2 |
Table 4. Types of resources connected to OutputManagementPlans in Zenodo metadata.
People and Organizations
Another important role of the OutputManagementPlan is to connect people and organizations, including funders, to projects. DataCite metadata includes contributorType and funderReferences to identify contributions to a project made by people and organizations. Table 4 shows that CDL credits a broader range of contributorTypes than Zenodo, who includes only funders.
Contributor Type | cdl.cdl | cern.zenodo | Total |
ProjectMember | 140 | 140 | |
ProjectManager | 225 | 225 | |
Producer | 269 | 269 | |
DataCurator | 288 | 288 | |
Resource Funder | 285 | 108 | 393 |
ProjectLeader | 647 | 647 | |
Total | 1854 | 108 | 1962 |
Table 4. Contributor types included in the OutputManagementPlan metadata.
A Way Forward?
CDL and Zenodo have made different choices in developing their output management plan metadata (Figure 1). This diversity reflects the fact that the DataCite metadata schema was not created with data management plans in mind and that many element definitions include significant wiggle room. Comparing these approaches helps us identify features that others can take advantage of as we move forward. As mentioned earlier, DOIs for Output Management Plans are a new capability so now is the time to develop consensus for how they are used and connected to other resources and people. In reality, many of these DMPs were created by tools like the DMPTool and Argos which facilitates creation of consistent DMPs in both systems. Some initial recommendations are made here.
Recommendation 1: Use Documents and IsDocumentedBy for connections between outputs and OutputManagementPlans.
Connecting outputs to a central hub is perhaps the most important role of output management plans. Choices for making these connections are related to the definitions listed in Table 2 which are clearly open to interpretation. Documentation may be described as all material required to make scientific results understandable and reproducible while metadata are the structured and standard subset of that material. In this sense, the output management plan is part of the documentation that provides context for the outputs and Documents and IsDocumentedBy should be used as these relationTypes.
Recommendation 2: Include resourceTypeGeneral for all related resources.
DMPs (and other research objects) can connect to many kinds of research items and knowing the types of connected resources is very helpful for users. All relatedIdentifiers in DMPs should include the resourceTypeGeneral element included in the relatedIdentifier section of the metadata for connected resources.
Recommendation 3: Include ContributorTypes for all contributors (individuals and organizations).
People and organizations contribute to projects in multiple roles and OutputManagementPlans provide an opportunity for recognizing those contributions. In the DataCite metadata schema the contributorType should be used in all cases to identify these roles.
Recommendation 4: Acknowledge in-kind contributions with the “sponsor” contributor type.
Many individuals and organizations contribute to the success of a project in non-financial ways. The contributor type “sponsor” describes contributors that “provide in-kind support, through donation, provision of people or a facility or instrumentation necessary for the development of the resource”. Use the “sponsor” contributorType for these in-kind contributions to distinguish them from financial support (funders). As an example, “sponsor” could be used to acknowledge a research facility that provides unique instrumentation. It can also be used for recognizing field stations that support research in some regions. Other in-kind contributions should also be acknowledged this way.
Recommendation 4: Use IsPartOf to connect plans to communities.
Using Zenodo communities to group resources related to research projects is an application of this capability that can be helpful for projects that are using Zenodo as a repository for all related research objects. Almost 60% of the DMPs in Zenodo are taking advantage of this capability.
Conclusion
We are in the early days of documenting output management plans using DataCite metadata so it is an important time to identify and adopt common practices that help realize the exciting goals of machine-actionable DMPs: “to facilitate a research data management (RDM) system that allows data and information about research to be communicated and shared across stakeholders, linking metadata, repositories, and institutions, and allowing for notifications and verification, real-time reporting, and automated compliance”. Common practices will facilitate the creation of interoperable DMPs and the development of tools across the research community that help us all reap the on-going benefits of these plans throughout the research life-cycle.