Funder Metadata: Identifiers and Award Numbers

Cite this blog as Habermann, T. (2022). Funder Metadata: Identifiers and Award Numbers. Front Matter. https://doi.org/10.59350/xrqzb-re120

Research activities depend on monetary and other support from a wide variety of state and federal agencies, universities, foundations, and individuals. Tracking information about these funders has historically been done in manual ways, e.g. acknowledgements in research outputs and aggregating information about research outputs supported by a particular funder has been a challenge without voluntary or coerced activity on the part of researchers, i.e. listing outputs in final reports.

This challenge can be addressed by collecting funder metadata when research outputs are submitted to journals or repositories. This approach improves prospects for future work but finding funding metadata for past work depends on analysis of existing textual descriptions of funders. This analysis needs to find two items in these descriptions: 1) funder names that can be used to find identifiers and 2) award numbers.

This blog describes such a search in metadata for 854 funder descriptions from 507 DRUM datasets in the Data Repository for University of Minnesota (DRUM). Persistent identifiers (RORs) were identified for forty-three funders that occurred 638 times, covering 43% of the affiliations. In addition, just over 200 award numbers were identified.

Affiliations

The affiliations in this set are likely representative of many collections of funder metadata. They range from simple funder names (National Science Foundation) to simple combinations of funder names and award numbers (National Science Foundation (NSF-CBET-1454259)) to long paragraphs describing complex funding from multiple sources through multiple awards to multiple authors: (Funding support for collection and analysis of syndromic surveillance data comes from the National Institute of Health (R01 AI058715, R01 AI120810 and R00 HD057992), National Science Foundation (LTREB-1052693), Arcus Foundation, USFWS Great Ape Conservation Fund, Morris Animal Foundation (D10ZO-902), University of Minnesota Consortium on Law and Values in Health, Environment, and the Life Sciences, University of Minnesota Doctoral Dissertation Fellowship and Lincoln Park Zoo.)

As mentioned above, the first challenge is to find funder names in these data. This is the first step towards finding persistent identifiers for these funders. These two steps were combined using the ROR affiliation search API which finds organizations (i.e. funders) that have RORs in simple descriptions. The current crosswalk between RORs and Crossref Funder IDs includes over 12,000 items so identifying RORs is also finding FundRef IDs. Several common characteristics of funder descriptions cause problems with this approach. First, many descriptions include funder acronyms instead of funder names, i.e. NSF instead of National Science Foundation. As described in  Acronyms Are Definitely Not Enough this leads to a significant number of incorrect RORs. For example, the acronym ‘NSF’ yields RORs for six organizations:

Organization Name Country ROR
National Sleep FoundationUnited Stateshttps://ror.org/00zc1hf95
Norwegian Nurses OrganisationNorwayhttps://ror.org/01822d048
The Neurosciences FoundationUnited Kingdomhttps://ror.org/01fy9e204
National Science FoundationUnited Stateshttps://ror.org/021nxhr62
Norsk SosiologforeningNorwayhttps://ror.org/03t3x0x69
Norsk Sveiseteknisk ForbundNorwayhttps://ror.org/03yr3hm52

Table 1. Organizations with the acronym NSF.

NSF is not alone in this category, ‘NIH’ also abbreviates six organization names.

A second acronym problem occurs for NSF because the standard form of an NSF award number includes the acronym for the NSF research area, e.g. AGS-01234567 is an award from the Division of Atmospheric and Geospace Sciences and ENG 0967163 is an award from the Directorate for Engineering. As one might expect, many of these acronyms also represent various organizations from all over the world, e.g AGS abbreviates nine organization names in four countries and ENG abbreviates one organization in France. Of course, these acronym problems are not showstoppers, but they increase the amount of manual checking of the ROR results that must be done to separate valid RORs from algorithm noise.

Forty-three funders were identified and validated from the DRUM data. The most common funders, The University of Minnesota and the National Science Foundation, occurred in over 200 descriptions and seven more funders occurred ten or more times. Table 2 shows the funder descriptions associated with the United States Department of Agriculture which occurred in nineteen different funder descriptions.

Funder descriptions identified as United States Department of Agriculture (https://ror.org/01na82s61)Award Numbers
USDA-NIFA-OREI
USDA NIFA MIN-98-G02MIN-98-G02
USDA 2011-67019-302002011-67019-30200
USDA-AFRI grant 1005423
USDA NIFA 2011-68002-300292011-68002-30029
USDA-NIFA Grant #2018-67011-280752018-67011-28075
USDA-AFRI Grant nos. 2016-67015-249112016-67015-24911
USDA Forest Service award 14-JV-11242303-12814-JV-11242303-128
USDA Forest Service Pacific Southwest Research Station
USDA-AFRI Grant nos. 2016-67015-24911 and 2015-68004-231312016-67015-24911
USDA-AFRI Grant nos. 2016-67015-24911 and 2015-68004-231312015-68004-23131
USDA National Needs Fellowship (USDA NIFA 2011-38420-20068)2011-38420-20068
University of Minnesota USDA APHIS Cooperative Agreement 15-1900-1493-CA15-1900-1493-CA
USDA-NIFA Specialty Crop Research Initiative Award No. 2017- 51181-268292017-
USDA-NIFA Specialty Crop Research Initiative Award No. 2017- 51181-2682951181-26829
USDA NIFA National Needs Fellowship (Appropriation No. 5430-21000-006-00D)5430-21000-006-00D
US Department of Agriculture Biotechnology Risk Assessment Research Grants Program (BRAG) (USDA BRAG 2015-06504 to PLM)2015-06504
United States Department of Agriculture (USDA), National Institute for Food and Agriculture (NIFA), Biotechnology Risk Assessment (BRAG), Grant #2015-065042015-06504
This work was supported by a grant from the Emerging Infectious Diseases signature research program, College of Veterinary Medicine, University of Minnesota and by a grant from the USDA/AFRI #2007-35212-18046.2007-35212-18046

Table 2. Funder descriptions for the United States Department of Agriculture.

Award Numbers

Once funders are identified, it is straightforward in many simple cases to identify award numbers using regular expressions. Award numbers were identified in 229 cases. Table 2 shows award numbers extracted from the USDA funder descriptions. The award numbers are correct in most cases, but small inconsistencies remain. For example, the lack of a letter in the award number 1005423 and the space in the award number 2017- 51181-26829 (shown in bold in Table 2) result in a missed award number and the incorrect identification of two award numbers. Unfortunately, like in the Whack-A-Mole, correcting one problem many times causes causes others. Thus the need for manual validation. In funder descriptions that include multiple organizations with or without award numbers, like the paragraph example given above, the problem is more difficult.

Conclusion

Improving the completeness and the machine-readability of funder metadata in the global research infrastructure, i.e. DataCite and Crossref, is a critical step along the path of using that infrastructure to identify and characterize research results supported by funders all over the world. A set of 854 funder descriptions from the DRUM repository were processed into 1482 affiliation strings. ROR identifiers were found for 638 of those affiliations and 229 award numbers were extracted. 

These results suggest that useful funding metadata can be extracted from funder descriptions and that that information, along with some human validation, can increase the completeness and machine-readability of funder metadata in DataCite.