ROR in a Database

Cite this blog as Habermann, T. (2022). ROR in a Database. Front Matter. https://doi.org/10.59350/129q1-ckn64

During March 2022 the Research Organization Registry released their first registry update that was independent of the organizational identifiers created and maintained by Digital Science (GRID), marking a key milestone for ROR. That release, and the current one released last week, are available in JavaScript Object Notation (json).

The json representation is perfect for quick import into python and other programming languages, but there are some applications, and potentially users, that can benefit from access to the data using SQL. A tool for translating the json into a database format, ROR2DB, is now available. A sqlite version of the ROR data created using this tool is also available.

Examples

I use ROR data and services to search large collections of affiliation strings for organization names and then search those names for RORs. I use the database version of the ROR data to explore data characteristics that might affect my results, usually by making it harder to find the correct RORs in a big pile of affiliations. In that work, I run into things that might help researchers as they provide affiliation strings that may eventually be searched for RORs to journals and repositories. These characteristics are here to stay, so I discuss them briefly to help researchers understand and stay aware of them. Check out the ROR documentation for other details.

Everyone knows that acronyms should be avoided or spelled out when first used in any scientific writing. This also means that acronyms should be avoided in affiliation strings. The most recent ROR release includes over 29,000 acronyms and 5254 are used more than once. Having these acronyms in the data can be super helpful for affiliation searches, but their existence needs to be kept in mind. The most ambiguous acronyms can be selected from the database with this query:

select acronym, count(*)  cnt
from acronyms
group by acronym
having cnt >= 25
Acronym Count Acronym Count
CCC39MCC28
AU31CRC26
SCC30MSU25
HCC30CMC25
MU29BU25

Table 1. The most ambiguous acronym counts.

Even if your organization is frequently referenced using an acronym, spelling it out in an affiliation string can help ensure that it is correctly identified. For example, one of the most common acronyms seen in my work is ‘NSF’. Eight organizations with RORs represented by ‘NSF’ can be retrieved from the database with this query: 

select acronym, name
from acronyms 
INNER JOIN ror
    ON acronyms.ror_id  = ror.id
where acronym = 'NSF';
Norwegian Nurses OrganisationNorsk Sveiseteknisk Forbund
Bulgarian Science FundNational Science Foundation of Sri Lanka
National Sleep FoundationThe Neurosciences Foundation
Norsk SosiologforeninNational Science Foundation

Table 2. Eight organizations with the acronym 'NSF'.

Acronyms that are the same as U.S. state abbreviations can be a particularly subtle challenge! These abbreviations are many times included as part of addresses in affiliation strings that are extracted from journal articles or other sources, so they can be problematic when searching a large number of affiliation strings for RORs. For example, results like “University of South Alabama”, “United States Army”, “University of San Agustin”, “University of South Asia” or “University Surgical Associates” can show up when “USA” is included in addresses embedded in affiliation strings. Keep an eye out for this when searching affiliation strings!

Ambiguity goes beyond acronyms to organization names when organizations from all over the world are considered. The current version of the ROR data includes 741 names that repeat almost 2000 times! Several of these ambiguous names are mentioned in the ROR documentation with strategies for mitigating their impact on searches, and a file that lists all of these ambiguous names is available in the ROR2DB repository. If you use these names in your affiliations or searches, make sure to provide other metadata along with the organization name so that the correct ROR can be selected.

Fortunately, Metadata Game Changers is still the only organization in ROR with ‘Metadata’ in its name!

Conclusion

The ROR data are critical for unambiguously identifying research organizations and for creating unambiguous connections across the global research infrastructure. I hope this database representation of the data is useful for you. Please let me know if you find it useful or have suggestions for improvement.