Project Metadata in DataCite
/Cite this blog as Habermann, T. (2023). Project Metadata in DataCite. Front Matter. https://doi.org/10.59350/zwzvv-1n627
Introduction
The DataCite Metadata Schema was developed to provide identification and citation metadata for a wide variety of resource types and it has been successfully used for many resource types for nearly two decades. In fact, the size of the resourceTypeGeneral vocabulary that is used for the required resourceTypeGeneral element grew significantly in the current version (4.4) of the schema, adding detail to the Text type with thirteen new types.
The DataCite schema also allows users to define types that they need but are not in the current list using the free text resourceType element, typically with resourceTypeGeneral = “Other”. This approach allows identification of new types in a way that allows discovery and easy migration if the new type becomes part of the shared vocabulary on the road to broader adoption.
As an example, many people have identified the need for project metadata for many years and, in fact, projects have been described in many metadata dialects (e.g. Habermann et al., 2018). Given the diversity of types described by DataCite metadata, it is interesting to explore whether users are using DataCite metadata for projects and, if so, how are they using it?
Finding DataCite metadata that includes “Project” in the resourceType is straightforward using the DataCite API query:
https://api.datacite.org/dois?query=types.resourceType:*Project*&page[size]=1 and the facets for this query provide an overview of repositories that are using this type:
Client | Name | Count |
cos.osf | Open Science Framework | 73,828 |
cern.zenodo | Zenodo | 28,687 |
tdl.tacc | Texas Advanced Computing Center | 621 |
gdcc.odum-library | UNC Libraries | 212 |
umich.library | University of Michigan Library | 193 |
fatj.ngeahg | WL - Publications | 167 |
cdl.cdl | California Digital Library | 93 |
unlv.ds | DigitalScholarship@UNLV | 68 |
tib.hawk | HAWK Hildesheim - Hornemann Institut | 58 |
tib.eurescom | Eurescom GmbH | 48 |
Total | 103,975 |
Table 1. These ten repositories have 99.75% ofDataCite records with "Project" in the resourceType element.
These numbers show that the Open Science Framework and Zenodo are using the project type for many records. Random samples of 5000 records were retrieved from these two repositories along with all records from the smaller users to understand more about usage of this resource type.
Metadata Content
ResourceType
The DataCite resourceType is an optional free-text field that provides more information about the type of a resource than the mandatory and controlled resourceTypeGeneral element. This dataset includes records that have ‘Project’ in their free-text resourceTypes. Each repository implements this free-text differently, some with simply Project and some with more details about the type of project or the resource type:
Client | Resource Type |
cos.osf | Project |
cern.zenodo | Project Deliverable, Project milestone |
tdl.tacc | Project/Other, Project/Report, Project/Other/REU, Project/Other/Dataset, Project/Experimental, Project/Other/Check Sheet, Project/Other/Database, Project/Other/Poster, Project/Other/Report, Project/Other/None, Project/Other/Code, Project/Other/Other, Project/Simulation |
gdcc.odum-library | Capstone Project, Project |
umich.library | Project, Master's Project |
fatj.ngeahg | Projectrapport, Project report |
Table 2. Some repositories use simply "Project", others provide more detail.
Funder Information
Given that research projects are often associated with particular funders and many times particular awards, one might expect that the project metadata records would include funder metadata. The data in Table 3 show that this is not the case with funder metadata occurring only in 33% of the TACC records, 13% of the Zenodo records and very few records in other repositories. The TACC projects have many different funders while the Zenodo funders are overwhelmingly the European Commission (3656). It is interesting to note that these metadata also include funder identifiers (Crossref funder Ids) and award numbers, i.e. unique funder identifiers for specific awards.
Repository | ProjectFunder | Funder Identifier | Award Number |
cern.zenodo | 3680 | 3680 | 3680 |
cos.osf | 6 | 6 | 3 |
gdcc.odum-library | 1 | 1 | |
tdl.tacc | 202 | 199 |
Table 3. Funder metadata in project records.
RelationType
One of the most important functions of DataCite metadata is to identify resources that are related to the resource being described by the metadata. The relationType, required for all related identifiers, describes the relationship between the resources. It must be selected a shared vocabulary.
Only two of the repositories with project metadata have related identifiers (Table 4). The three most common relationTypes in Zenodo are essentially Zenodo specific relation types related related to communities (IsPartOf) and DOI versioning (IsVersionOf and HasVersion).
cern.zenodo | |||
relationType | Count | relationType | Count |
IsPartOf (community) | 3374 | IsCompiledBy | 13 |
IsVersionOf (DOI Versioning) | 2567 | Continues | 10 |
HasVersion (DOI Versioning) | 2530 | IsPublishedIn | 7 |
Cites | 213 | IsContinuedBy | 5 |
References | 117 | IsDocumentedBy | 5 |
IsIdenticalTo | 65 | IsSourceOf | 4 |
IsSupplementTo | 57 | IsPreviousVersionOf | 3 |
IsSupplementedBy | 42 | IsDescribedBy | 1 |
HasPart | 33 | Obsoletes | 1 |
IsDerivedFrom | 19 | Requires | 1 |
IsNewVersionOf | 16 | Documents | 1 |
IsReferencedBy | 15 | IsRequiredBy | 1 |
CitedBy | 13 | ||
tdl.tacc | |||
IsPartOf | 148 | IsNewVersionOf | 6 |
IsSupplementTo | 28 | HasPart | 1 |
IsDocumentedBy | 21 | CitedBy | 1 |
References | 14 |
Table 4. RelationTypes in Zenodo and TACC repository project metadata. No other repositories included relationTypes.
RelatedIdentifier/ResourceTypeGeneral
Knowing the relations between project resources is important, but the types of the related objects are equally important to users looking for specific types of related objects. These types are indicated by the resourceTypeGeneral element of the relatedIdentifier. The only repository that includes this information is Zenodo and the types are shown below. Note that these occur in ~70% of the project records.
cern.zenodo | |||
relatedItemTpye | Count | relatedItemTpye | Count |
Text | 287 | Report | 5 |
JournalArticle | 58 | Other | 5 |
Software | 37 | Book | 4 |
Dataset | 24 | OutputManagementPlan | 3 |
PhysicalObject | 8 | Collection | 2 |
Audiovisual | 6 | ConferencePaper | 1 |
Table 5. Related Item Types in Zenodo project metadata.
Projects as Hubs
Projects are composed of the resources used for planning, executing, and reporting on the work done during the project. As such, project metadata serves as a hub for connecting these resources. The observations described above indicate that this capability is currently underused. Only two of the largest project repositories in DataCite include related resources and, even in those cases, these connections are only available for a small portion of the projects.
There are a few projects that stand out as connection hubs, all from Zenodo. Table six shows the projects with more than twenty and Figure 1 shows the connected types and relations for the project 10.5281/zenodo.6804281.
DOI | Book | Collection | Conference Paper | Dataset | Journal Article | Physical Object | Report | Software | Text |
10.5281/zenodo.6451260 | 2 | 29 | 8 | 1 | 12 | 34 | |||
10.5281/zenodo.7035016 | 2 | 4 | 13 | 12 | |||||
10.5281/zenodo.5824544 | 10 | 2 | 1 | 14 | |||||
10.5281/zenodo.6804281 | 1 | 2 | 1 | 2 | 1 | 7 | 7 |
Table 6. Projects with more than ten connections.
In some cases, projects are isolated with few interconnections. In others, there are many interconnections (Figure 1).
Project Connectivity
Connectivity is a concept that has been applied to DataCite metadata primarily as a measure of identification for potential connections. Metadata with identifiers for people, organizations, and related objects are said to be well connected.
The DataCite project metadata was examined for connectivity and the results, shown in Figure 2, indicate that connectivity varies widely over the project records. The gdcc.odum-library and cos.osf repositories include Resource Author Affiliation Identifiers, the cos.osf and cern.zenodo repositories include funder and award identifiers, and the cern.zenodo repository includes Resource Contact identifier metadata. As shown above, cern.zenodo and tcl.tacc include related identifiers with many relation types. Identifiers included in more than two repositories are not shown here. That includes resource Resource Author Identifiers, included in four repositories, and Resource Author Affiliations, included in five repositories.
Conclusion
DataCite metadata with the term Project in the resourceType occur in at least ten repositories, with the majority in the Center for Open Science (cos.osf) and Zenodo (cern.zenodo). Samples of these metadata were examined to determine how projects are represented in DataCite. While single repositories show consistent patterns, no clear overall pattern emerged.
Several metadata elements that might be expected in project metadata were not very common in these samples. For example, only Zenodo and the Texas Advanced Computing Center identified project funders and included some funder and award identifiers.
Only two repositories include metadata for resources related to these projects (cern.zenodo and tdl.tacc) and only cern.zenodo provides type for the related resources.
The FAIR Island Project recently developed some pilot project metadata records as part of a data management class. These examples focus on connectivity with related identifiers for people and organizations, output management plans and measurement protocols.
In any case, the idea of project metadata in DataCite remains experimental. As is the case with other new resource types, community consensus and guidelines will increase the usability and impact of these metadata.