Making the Invisible Visible: Celebrating the Year of Open Science
/Ted Habermann, Metadata Game Changers
Jamaica Jones, University of Pittsburgh
Cite this blog as Habermann, T. (2024). Making the Invisible Visible: Celebrating the Year of Open Science. Front Matter. https://doi.org/10.59350/77zs1-hz764
Metadata Game Changers and the INFORMATE Project had the opportunity to present some of our recent work during the recent culminating conference to showcase the outcomes, coalition-building efforts, and ongoing work stemming from the 2023 Year of Open Science (YOS). Our talk focused on FAIRness of DataCite metadata in university repositories, consistency of funder metadata, and comparisons between CHORUS data from the global research infrastructure and several NSF repositories. Some highlights are described here, the slides are available here, and a recording of the talk is available.
University Metadata FAIRness
DataCite metadata from 387 university repositories was evaluated using the MetaDIG FAIRness recommendations (Habermann, 2019). The results, shown in Figure 1, indicate that, even though the DataCite metadata schema includes many elements that can support all of the FAIR use cases (findability, accessibility, interoperability, and reuse), most universities do not take advantage of these capabilities in their metadata. Instead, they focus on providing minimal metadata required to get DOIs quickly.
Figure 1. Metadata completeness for FAIR use cases in university repositories at DataCite. The Y-axis shows the number of repositories, and the X-Axis shows the completeness level. The completeness is highest for the Text use case because of mandatory DataCite metadata fields and decreases for all other use cases the correspond to the A, I, and R in FAIR.
Consistency of Funder Metadata
Consistent metadata is critical within and across repositories to correctly identify connections between research objects of many kinds. With the goal of characterizing contributions from specific awards, we searched text award metadata in CHORUS retrievals from Crossref to find NSF award numbers. We were successful in identifying award numbers for 93% of the awards. However, this leaves almost 32,000 free text award descriptions for which award numbers could not be identified (Figure 2). Almost half of these are errors in the length of the provided award numbers, i.e. they are shorter or longer than the required seven digits. In other cases, award titles are provided instead of identifiers, or apparently random text.
Figure 2. CHORUS data includes over 430,000 text descriptions of award numbers. Actionable award numbers could be recognized in 93% of these cases, leaving nearly 32,000 without recognizable award numbers.
The Denominator
Understanding how well agency repositories are doing capturing information on research contributions requires that we can estimate the total number of contributions that are out there, i.e. the denominator. CHORUS data provides one estimate of this denominator, and comparing CHORUS and PAR indicates that 36% of the articles included in CHORUS are also recorded in PAR (Figure 3). A second estimate this denominator comes from the NSF Award database. In this case, 6% of the awards in the Award Database are included in PAR.
Figure 3. Comparisons of the NAF Public Access Repository (PAR) with several estimates of the denominator.
Conclusion
This talk describes several challenges identified while exploring the global research infrastructure using CHORUS data for NSF and other agencies. Descpite these challenges, the talk finishes by describing several successes: using existing metadata from institutional repositories to improve DataCite completeness and re-curating affiliations and organizational identifiers into Dryad.
The recording of the talk is available here, and the slides are available here, please let us know if you have questions or comment below.
References
Habermann, T. (2019). MetaDIG recommendations for FAIR DataCite metadata. Front Matter. https://doi.org/10.59350/n31gm-kg364