Metadata Game Changers
  • Home
  • Offerings Capabilities Our Team Contact
  • Software
  • Metadata Game
  • Blog
Metadata Game Changers
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/
dataCiteModel_V6.jpg
Metadata Game Changers

Blog

Exploring metadata, communities, and new ideas.

Metadata Game Changers
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/
May 07, 2023

Project Metadata in DataCite

May 07, 2023/ Ted Habermann

Cite this blog as Habermann, T. (2023). Project Metadata in DataCite. Front Matter. https://doi.org/10.59350/zwzvv-1n627

Introduction

The DataCite Metadata Schema was developed to provide identification and citation metadata for a wide variety of resource types and it has been successfully used for many resource types for nearly two decades. In fact, the size of the resourceTypeGeneral vocabulary that is used for the required resourceTypeGeneral element grew significantly in the current version (4.4) of the schema, adding detail to the Text type with thirteen new types.

The DataCite schema also allows users to define types that they need but are not in the current list using the free text resourceType element, typically with resourceTypeGeneral = “Other”. This approach allows identification of new types in a way that allows discovery and easy migration if the new type becomes part of the shared vocabulary on the road to broader adoption.

As an example, many people have identified the need for project metadata for many years and, in fact, projects have been described in many metadata dialects (e.g. Habermann et al., 2018). Given the diversity of types described by DataCite metadata, it is interesting to explore whether users are using DataCite metadata for projects and, if so, how are they using it?

Finding DataCite metadata that includes “Project” in the resourceType is straightforward using the DataCite API query:

https://api.datacite.org/dois?query=types.resourceType:*Project*&page[size]=1 and the facets for this query provide an overview of repositories that are using this type:

Client Name Count
cos.osfOpen Science Framework73,828
cern.zenodoZenodo28,687
tdl.taccTexas Advanced Computing Center621
gdcc.odum-libraryUNC Libraries212
umich.libraryUniversity of Michigan Library193
fatj.ngeahgWL - Publications167
cdl.cdlCalifornia Digital Library93
unlv.dsDigitalScholarship@UNLV68
tib.hawkHAWK Hildesheim - Hornemann Institut58
tib.eurescomEurescom GmbH48
Total 103,975

Table 1. These ten repositories have 99.75% ofDataCite records with "Project" in the resourceType element.

 These numbers show that the Open Science Framework and Zenodo are using the project type for many records. Random samples of 5000 records were retrieved from these two repositories along with all records from the smaller users to understand more about usage of this resource type.

Metadata Content

ResourceType

The DataCite resourceType is an optional free-text field that provides more information about the type of a resource than the mandatory and controlled resourceTypeGeneral element. This dataset includes records that have ‘Project’ in their free-text resourceTypes. Each repository implements this free-text differently, some with simply Project and some with more details about the type of project or the resource type:

Client Resource Type
cos.osfProject
cern.zenodoProject Deliverable, Project milestone
tdl.taccProject/Other, Project/Report, Project/Other/REU, Project/Other/Dataset, Project/Experimental, Project/Other/Check Sheet, Project/Other/Database, Project/Other/Poster, Project/Other/Report, Project/Other/None, Project/Other/Code, Project/Other/Other, Project/Simulation
gdcc.odum-libraryCapstone Project, Project
umich.libraryProject, Master's Project
fatj.ngeahgProjectrapport, Project report

Table 2. Some repositories use simply "Project", others provide more detail.

 Funder Information

Given that research projects are often associated with particular funders and many times particular awards, one might expect that the project metadata records would include funder metadata. The data in Table 3 show that this is not the case with funder metadata occurring only in 33% of the TACC records, 13% of the Zenodo records and very few records in other repositories. The TACC projects have many different funders while the Zenodo funders are overwhelmingly the European Commission (3656). It is interesting to note that these metadata also include funder identifiers (Crossref funder Ids) and award numbers, i.e. unique funder identifiers for specific awards.

Repository ProjectFunder Funder Identifier Award Number
cern.zenodo368036803680
cos.osf663
gdcc.odum-library11
tdl.tacc202199

Table 3. Funder metadata in project records.

 RelationType

One of the most important functions of DataCite metadata is to identify resources that are related to the resource being described by the metadata. The relationType, required for all related identifiers, describes the relationship between the resources. It must be selected a shared vocabulary.

Only two of the repositories with project metadata have related identifiers (Table 4). The three most common relationTypes in Zenodo are essentially Zenodo specific relation types related related to communities (IsPartOf) and DOI versioning (IsVersionOf and HasVersion).

cern.zenodo
relationType Count relationType Count
IsPartOf (community)3374IsCompiledBy13
IsVersionOf (DOI Versioning)2567Continues10
HasVersion (DOI Versioning)2530IsPublishedIn7
Cites213IsContinuedBy5
References117IsDocumentedBy5
IsIdenticalTo65IsSourceOf4
IsSupplementTo57IsPreviousVersionOf3
IsSupplementedBy42IsDescribedBy1
HasPart33Obsoletes1
IsDerivedFrom19Requires1
IsNewVersionOf16Documents1
IsReferencedBy15IsRequiredBy1
CitedBy13
tdl.tacc
IsPartOf148IsNewVersionOf6
IsSupplementTo28HasPart1
IsDocumentedBy21CitedBy1
References14

Table 4. RelationTypes in Zenodo and TACC repository project metadata. No other repositories included relationTypes.

RelatedIdentifier/ResourceTypeGeneral

Knowing the relations between project resources is important, but the types of the related objects are equally important to users looking for specific types of related objects. These types are indicated by the resourceTypeGeneral element of the relatedIdentifier. The only repository that includes this information is Zenodo and the types are shown below. Note that these occur in ~70% of the project records.

cern.zenodo
relatedItemTpye Count relatedItemTpye Count
Text287Report5
JournalArticle58Other5
Software37Book4
Dataset24OutputManagementPlan3
PhysicalObject8Collection2
Audiovisual6ConferencePaper1

Table 5. Related Item Types in Zenodo project metadata.

Projects as Hubs

Projects are composed of the resources used for planning, executing, and reporting on the work done during the project. As such, project metadata serves as a hub for connecting these resources. The observations described above indicate that this capability is currently underused. Only two of the largest project repositories in DataCite include related resources and, even in those cases, these connections are only available for a small portion of the projects.

There are a few projects that stand out as connection hubs, all from Zenodo. Table six shows the projects with more than twenty and Figure 1 shows the connected types and relations for the project 10.5281/zenodo.6804281.

DOIBookCollectionConference PaperDatasetJournal ArticlePhysical ObjectReportSoftwareText
10.5281/zenodo.6451260229811234
10.5281/zenodo.7035016241312
10.5281/zenodo.5824544102114
10.5281/zenodo.68042811212177

Table 6. Projects with more than ten connections.

In some cases, projects are isolated with few interconnections. In others, there are many interconnections (Figure 1).

Figure 1. Several projects with diverse relations and some inter-connections

Project Connectivity

Connectivity is a concept that has been applied to DataCite metadata primarily as a measure of identification for potential connections. Metadata with identifiers for people, organizations, and related objects are said to be well connected.

 

The DataCite project metadata was examined for connectivity and the results, shown in Figure 2, indicate that connectivity varies widely over the project records. The gdcc.odum-library and cos.osf repositories include Resource Author Affiliation Identifiers, the cos.osf and cern.zenodo repositories include funder and award identifiers, and the cern.zenodo repository includes Resource Contact identifier metadata. As shown above, cern.zenodo and tcl.tacc include related identifiers with many relation types. Identifiers included in more than two repositories are not shown here. That includes resource Resource Author Identifiers, included in four repositories, and Resource Author Affiliations, included in five repositories.

Conclusion

DataCite metadata with the term Project in the resourceType occur in at least ten repositories, with the majority in the Center for Open Science (cos.osf) and Zenodo (cern.zenodo). Samples of these metadata were examined to determine how projects are represented in DataCite. While single repositories show consistent patterns, no clear overall pattern emerged.

Several metadata elements that might be expected in project metadata were not very common in these samples. For example, only Zenodo and the Texas Advanced Computing Center identified project funders and included some funder and award identifiers.

Only two repositories include metadata for resources related to these projects (cern.zenodo and tdl.tacc) and only cern.zenodo provides type for the related resources.

The FAIR Island Project recently developed some pilot project metadata records as part of a data management class. These examples focus on connectivity with related identifiers for people and organizations, output management plans and measurement protocols.

In any case, the idea of project metadata in DataCite remains experimental. As is the case with other new resource types, community consensus and guidelines will increase the usability and impact of these metadata.

May 07, 2023/ Ted Habermann/ Comment
DataCite, projectMetadata

Ted Habermann

  • ORCID Profiles – What’s There and ...
  • FAIR Island Experiments with ...
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/

Metadata Game Changers

I have worked in scientific data management for many years and enjoy working with organizations and communities that share data and knowledge. I am fluent in metadata standards and dialects used in scientific data management and publishing.

Tell us what you think!

We are constantly working to help you change your metadata game. If you have any questions, suggestions, or crazy ideas, please send contact us or connect with us through the details below.

Ted Habermann
ted@metadatagamechangers.com
ORCID | LinkedIn | Twitter

Erin Robinson
erin@metadatagamechangers.com
ORCID | LinkedIn | Twitter

or use this form.

Search the site:

Subscribe

Sign up with your email address to receive news and updates.

We respect your privacy.

Thank you!

Powered by Squarespace.