Funder Acronyms Are Still Not Enough
/Ted Habermann, Metadata Game Changers
Jamaica Jones, University of Pittsburgh
Howard Ratner, Tara Packer, CHORUS
Introduction
Cite this blog as Habermann, T., Jones, J., Ratner, H., Packer, T., (2024). Funder Acronyms Are Still Not Enough. Front Matter. https://doi.org/10.59350/cnkm2-18f84
Funder metadata is becoming more important as the global research infrastructure is engaged as a tool for quantifying impact of funders on research results and as interest in open science increases. The transition from Crossref Funder IDs to ROR IDs for funders (French et al., 2023, Lammey, 2024) is also focusing some attention on the funder identifiers used in these metadata.
A recent workshop described benefits and challenges faced as the community tries to increase the amount and quality of funder metadata (De Jonge et al., 2023). The authors observed: “Authors often persist in the wrong spelling of their funder and do not choose predefined suggestions, making it very difficult to match input to Funder IDs”. Using acronyms instead of complete funder names is another persistent problem for a variety of organizations in the U.S. and around the world, specifically NSF and NIH and likely others (Habermann, 2021, 2022).
Working with funder identifiers as part of the INFORMATE project (Habermann et al., 2023) has brought the realities of these challenges to the fore. In this blog we highlight real-world examples that might be helpful in illustrating the problem.
Crossref Funder Metadata
The U.S. National Science Foundation (Crossref Funder ID:100000001, ROR: https://ror.org/021nxhr62) emerges as a major player wherever one looks at funder metadata. A recent CHORUS (https://www.chorusaccess.org/) report of Crossref article DOIs including both the NSF funder ID and name in the funder metadata totaled 85,091 journal articles with metadata for 970,803 awards. The acronym “NSF” occurred 792,452 times in the funder names associated with these awards and the acronym “NSF” occurred by itself 167,767 times (Figure 1).
A ROR search for “NSF” illuminates the challenge of associating an identifier (a ROR ID in this case) with these “NSF” acronyms. Nine potential choices, all identified from the acronym, share the same score (0.9), so no best choice is chosen by the algorithm. Even if the country is included in the search, there are three equal choices.
Matching Type
Score
Chosen
ROR
Organization
Country
ACRONYM
0.90
False
Nick Simons Foundation
United States
ACRONYM
0.90
False
National Sleep Foundation
United States
ACRONYM
0.90
False
National Science Foundation of Sri Lanka
Sri Lanka
ACRONYM
0.90
False
Norwegian Nurses Organisation
Norway
ACRONYM
0.90
False
The Neurosciences Foundation
United Kingdom
ACRONYM
0.90
False
National Science Foundation
United States
ACRONYM
0.90
False
Norsk Sosiologforening
Norway
ACRONYM
0.90
False
Bulgarian Science Fund
Bulgaria
Table 1. Organizations identified by ror.org as potential matches for the acronym "NSF". Note all have the same score and none are chosen by the algorithm. These results were provided by RORRetriever (Habermann, 2022a).
Six of these organizations have Crossref Funder IDs and Table 2 shows the number of times they occur in the CHORUS data for NSF. Together these organizations make up over 2200 cases. Of course, it is impossible to know whether these funder metadata are correct without knowing more details about the source.
Organization
Crossref Funder ID
Count
Norwegian Nurses Organisation
501100004190
31
Bulgarian Science Fund
501100003336
57
National Sleep Foundation
100003187
1233
The Neurosciences Foundation
501100020414
26
Nick Simons Foundation
100016620
3
National Science Foundation of Sri Lanka
501100008982
906
The funder metadata in the CHORUS Report is retrieved from Crossref (Habermann, 2023) who collect metadata received from publishers as well as others. De Jonge et al. 2023 explored challenges in the metadata workflow and identified collecting funder metadata and extracting it from free text as a common source of funder metadata errors. This reflects the inherent complexity of funding sources and the diversity of describing those sources in free text. This acknowledgement text fragment illustrates typical challenges:
“S. E. O., L. D. T., S. T. G., and V. T. (at sea) were supported by the Southern Ocean Carbon and Climate Observations and Modeling Project under NSFPLR-1425989; V. T. and I. C. received significant support from NSFOCE-1357072. S. T. G. and I. C. were also supported by NSFOCE-1658001. S. A. J. is supported by the UK Natural Environment Research Council, including the ORCHESTRA grant (NE/N018095/1).”
First, funding is different for different authors, represented by initials, and spread throughout the text. Second, funder names are represented as acronyms (“NSF”) in several cases and spelled out in another. Finally, funder acronyms are concatenated with award numbers (NSFOCE-1658001). This diversity is not unusual as these acknowledgements have been written for humans to read for many years.
Of course, there are many steps through many systems that these metadata take between the researcher and the global research infrastructure. It could be that someone along the way mistakenly picked the wrong funder id and/or funder name from a list of funder names in a manuscript tracking system. If authors were given the opportunity to check the funder id/funder name when checking their article proofs, they may well have spotted the error and fixed it. However, this would require the publisher’s system to show information not part of the original manuscript submission to the author.
DataCite Commons Funder Metadata
DataCite Commons makes it possible to search a different swath of the global research infrastructure for funders using Crossref Funder and ROR IDs. For example, the URL https://commons.datacite.org/ror.org/010xaa060 provides summary information on over 10,000 Text, Journal Article, Dataset and Collection resources that include the ID for the National Science Foundation of Sri Lanka (https:// ror.org/010xaa060), see Figure 1.
Most of these resources (~7,616) are text resources and over 200 are datasets. The datasets are retrieved directly from DataCite as opposed to being linked to articles retrieved from Crossref in the CHORUS case. Similar data can be retrieved for the National Sleep Foundation.
The related works from DataCite Commons can be retrieved using the DataCite Commons GraphQL API (see Fenner, 2021 for guidance on pagination using this API). The query shown below retrieves DOI, type, registrationAgency, publisher, and funder metadata.
{ organization(id: "https://ror.org/010xaa060") { id name works(first: 10) { totalCount pageInfo { endCursor hasNextPage } nodes { doi type registrationAgency publisher publicationYear fundingReferences { funderName funderIdentifier awardNumber } } } } }
The National Science Foundation of Sri Lanka data retrieved from DataCite Commons includes DataCite funder metadata for 753 awards and the funder identifier for the National Science Foundation of Sri Lanka (https://doi.org/10.13039/501100008982) is listed for 435 of these awards. The funder name in every case is “National Science Foundation”.
Combining award metadata from DataCite and Crossref, there are 11,573 occurrences of this funder ID associated with 136 different funder name strings, most of them variations on National Science Foundation and some identifying national science foundations in countries other than the United States and Sri Lanka. The most common funder names are shown in Table 3. It is clear that most of these funder IDs are incorrect, in fact, only 744/11,573 (6%) of these funder names are the correct name for this identifier.
Funder ID and Name
Count
https://doi.org/10.13039/501100008982 (Total)
11,573
National Science Foundation
9,974
National Science Foundation of Sri Lanka
735
NSF
518
Swiss National Science Foundation
43
MoSTR | National Science Foundation
42
U.S. National Science Foundation
34
Award number information is more limited in these metadata, but many of the award numbers are consistent with known patterns for U.S. National Science Foundation award numbers (seven-digit numbers or combinations of NSF abbreviations and seven digit numbers).
The picture is very similar for the National Sleep Foundation, although in this case all 882 of the related resources are from Crossref. These include 897 awards with the National Sleep Foundation funder id (https://doi.org/10.13039/100003187), 206 (23%) of which have the funder name “National Sleep Foundation” and 672 of which have the funder name “National Science Foundation”.
Conclusion
The global research literature is filled with free text names of researchers, organizations, and funders which make it difficult to unambiguously recognize these entities and correctly connect them to their contributions. Persistent Identifiers (PIDS) have been developed over the last several decades to help address these problems and implementation of these identifiers in many publication systems is underway. Tracking these implementations, understanding problems, and fixing them is critical as these efforts move forward.
Identifiers for funders and awards form the foundation for understanding funder contributions to the global research landscape, so it is important to identify and trace the sources of problems that might occur as required automation is developed and spreads across organizations and processing systems. The U.S. National Science Foundation funds many awards across multiple scientific disciplines and is ubiquitously referred to using the acronym “NSF”. The large number of occurrences of this acronym makes it possible to use it to identify problems that may only occur a small percentage of the time.
Using metadata from Crossref included in CHORUS reports, we searched for identifiers of multiple organizations with the acronym “NSF” and identified over 2000 references to organizations other than the U.S. National Science Foundation within the correctly identified NSF-related records. In many cases, we identified metadata where funder names were given as “National Science Foundation” even though the identifiers were for other organizations. We identified similar funder ambiguities and apparent identifier errors in metadata from the DataCite Commons for datasets and text resources.
The overall number of errors is generally small (<2%) relative to the total number of resources, but it elucidates a problem that may occur more broadly and not be recognized and there may be significant impact in particular repositories. Including funder metadata as part of an article’s metadata is crucial for the workflow of open science. As open metadata found in places like CHORUS, Crossref, ROR and DataCite are used to characterize funder impacts in open research, automated validation aimed at identifying errors like those described here are a critical part of the value chain.
Acknowledgements
This work is part of the INFORMATE Project, a partnership between Metadata Game Changers and CHORUS. This work was funded by the U.S. National Science Foundation (https://ror.org/021nxhr62), award 2334426.
Data Availability
The data retrieved from the DataCite Commons for this work are available at https://doi/org/10.5281/zenodo.11116775. CHORUS reports are available for several funding agencies at https://dashboard.chorusaccess.org/.
References
DataCite Commons, Data for National Science Foundation of Sri Lanka, https://commons.datacite.org/ror.org/010xaa060, retrieved May 4, 2024.
DataCite Commons, Data for National Sleep Foundation, https://commons.datacite.org/ror.org/00zc1hf95, retrieved May 4, 2024.
DataCite Commons, DataCite GraphQL API Guide, https://support.datacite.org/docs/datacite-graphql-api-guide, retrieved May 4, 2024.
De Jonge, H., Kramer, K., Michaud, F., and Hendricks, G., 2023, Open funding metadata through Crossref; a workshop to discuss challenges and improving workflows, https://www.crossref.org/blog/open-funding-metadata-community-workshop-report/.
Fenner, M., Pagination with cursor in GraphQL API, https://pidforum.org/t/pagination-with-cursor-in-graphql-api/1572, retrieved May 4, 2024.
French, A., Hendricks, G., Lammey, R, Michaud, F., and Gould, M., 2023, Open Funder Registry to transition into Research Organization Registry (ROR), https://www.crossref.org/blog/open-funder-registry-to-transition-into-research-organization-registry-ror/.
Habermann, T., 2021, Acronyms are Definitely Not Enough, https://doi.org/10.59350/93v82-yr723.
Habermann, 2022, Funder Metadata: Identifiers and Award Numbers, https://doi.org/10.59350/xrqzb-re120.
Habermann, 2022a, Need help searching for RORs? Try RORRetriever!, https://doi.org/10.59350/4gxfz-4kb47.
Habermann, 2023, CHORUS Data Journey, https://doi.org/10.59350/ksgzn-a6w37.
Habermann, T., Jones, J., Packer, T., and Ratner, H., 2023, INFORMATE: Metadata Game Changers and CHORUS Collaborate to Make the Invisible Visiblehttps://doi.org/10.59350/yqkat-59f79.
Lammey, R., 2024, RORing ahead: using ROR in place of the Open Funder Registry, https://www.crossref.org/blog/roring-ahead-using-ror-in-place-of-the-open-funder-registry/