How Many RORs Do We Need?
/Cite this blog as Habermann, T. (2019). How Many RORs Do We Need? Front Matter. https://doi.org/10.59350/4tbaw-m9382
The scholarly communications and open science communities are getting excited about the benefits of metadata that includes persistent identifiers for organizations. Now that an open registry of organizational identifiers exists, the community is facing two adoption challenges: 1) evolving metadata dialects to include identifiers and 2) implementing them in existing and future metadata. Several dialects already include this capability and others are implementing it and building these identifiers into searches. In other words, great progress on Step 1. How can we move forward on Step 2?
I took a look at a sample of DataCite metadata and discovered that many DataCite repositories manage metadata from a small number of organizations and, therefore, only need to know a small number of organizational identifiers (RORs are the focus here) to populate their entire collections. These repositories are supported by DataCite members. An obvious next question is: Are there DataCite members that only need to know a small number of RORs?
My sample of over 95,000 DataCite records shows that twenty-three DataCite members have five or fewer affiliations in their metadata included in the sample. The answer appears to be: yes, some DataCite members only need to know a small number of RORs. More good news.
The Affiliations
The DataCite members with five or less affiliations are shown in Table 1. Combined, they have almost fifty unique affiliation strings in their existing metadata (Table 1). Even this small sample demonstrates the kinds of challenges associated with free text affiliation strings. There are small punctuation differences (“American University, Center for Social Media” vs. “American University Center for Social Media”), acronyms (BGI) and abbreviations (Penn vs. Pennsylvania, CalTech vs. California Institute of Technology), content left over from templates (Mysteeriorganisaatio), and multiple names for the same organization (“Technological University Dublin” vs. “Technological University for Dublin”). These and other similar challenges will keep us busy until persistent identifiers for organizations are universal!
Table 1. DataCite members with five or less affiliations in the sample metadata and those affiliations.
Member | Affiliation |
Aridhia Informatics Ltd. (id = aridhia) | EPAD |
Aridhia Informatics | |
American University Library (id = au) | Center for Media and Social Impact |
American University, Center for Social Media | |
American University Center for Social Media | |
Center for Social Media | |
American University | |
The Commonwealth Fund (id = cmwf) | Dartmouth College |
University of Oxford | |
Commonwealth Fund | |
China National GeneBank (id = cngb) | BGI |
Network for Computational Modeling in the Social and Ecological Sciences (CoMSES Net) (id = comses) | Arizona State University |
National Research Institute of Science and Technology for Environment and Agriculture | |
Unité Mixte Internationnale de Modélisation Mathématique et Informatiques des Systèmes Complèxes | |
CSC IT Center for Science (id = csc) | Aalto University |
Mysteeriorganisaatio | |
University of Colorado Boulder Libraries (id = cub) | University of Colorado Boulder |
CyArk (id = cyark) | CyArk |
Cyberleninka (id = cyberl) | International Society for the Comparative Study of Civilizations (ISCSC), Board of Directors Member; Pro-Rector, International University for the Societal Development; Washington, the USA |
Publications Office of the European Union (id = europ) | mEDRA |
Global Biodiversity Information Facility (id = gbif) | The University Museum, The University of Tokyo |
Globus (id = globus) | University of Illinois at Urbana Champaign |
Lawrence Berkeley National Laboratory | |
University of Illinois Urbana-Champaign Library (id = illinois) | University of Illinois at Urbana-Champaign |
Indiana University (id = iu) | Indiana University School of Medicine |
Indiana University Bloomington | |
National Institute of Standards and Technology (id = nist) | National Institute of Standards and Technology |
OCLC, Inc. (id = oclc) OCLC Research | |
Pennsylvania State University (id = psu) | Penn State University |
Pennsylvania State University | |
University of Illinois at Urbana Champaign | |
University of Adelaide | |
Smithsonian Institution | |
Research Workspace (id = rewo) | Audubon Alaska |
TIND (id = tind) | California Institute of Technology |
Caltech | |
Caltech Library | |
The Aerospace Corporation | |
University College Dublin (id = ucd) | Technological University Dublin |
Technological University for Dublin | |
University of Gothenburg | |
National University of Ireland, Galway | |
United Language Group (United States) | |
University of Miami Libraries (id = uml) | University of Miami |
UNAVCO (id = unavco) | UNAVCO, Inc. |
Pennsylvania State University | |
University of Colorado - Boulder | |
United States Geological Survey | |
University of Washington - Seattle | |
University of Utah (id = uutah) | Journal of Early Hearing Detection and Intervention |
Texas A&M University |
Finding the RORs
This small sample presents a microcosm of the world of finding identifiers for organizations in order to augment existing metadata by adding identifiers. This challenge has been addressed in many contexts and experiences from the American Physical Society, NASA, Dryad, and many others all indicate that it is a tricky business. The challenges identified in this small sample become difficult at scales when automation is required.
I have been experimenting with two strategies I developed with Dryad and the affiliation search recently introduced by ROR. These are referred to as Dryad, String, and Affiliation strategies. Results of searching for RORs with all three strategies are discussed below.
Affiliations with One ROR
The most common result, twenty-three of forty-nine cases, was for the same ROR to be identified by all three strategies. These are indicated with DSA in the Strat column of Table 2 and are mostly affiliation strings that 1)match the ROR name for the organization, 2) identify only one organization, and 3) do not include much extraneous information. All good affiliation name practices.
The next most common result is one ROR identified by the String and Affiliation strategies (SA in Table 2). These are also clean affiliation strings with names that match ROR names or aliases (Caltech). The Dryad strategy did not identify these affiliations because it uses an affiliation lookup table built from past experience. When that table is updated with the SA affiliations and RORs, these will become DSA matches.
The final group of RORs are those identified by just one strategy (D or A). These reflect differences in the details of the strategies that inevitably show up when searching with multiple strategies. It could be the curation involved in the Dryad lookup table, or details of the matching and ranking algorithms in the ROR affiliation search. In any case, these provide interesting learning opportunities. The only suspicious pick is for the affiliation “Journal of Early Hearing Detection and Intervention” which is the name of a journal instead of an organization.
Table 2. Affiliations with one ROR.
Affiliation | Strat* | ROR | Organization Name | Country |
Aalto University | DSA | https://ror.org/020hwjq30 | Aalto University | Finland |
American University | DSA | https://ror.org/052w4zt36 | American University | United States |
American University, Center for Social Media | DSA | https://ror.org/052w4zt36 | American University | United States |
Arizona State University | DSA | https://ror.org/03efmqc40 | Arizona State University | United States |
California Institute of Technology | DSA | https://ror.org/05dxps055 | California Institute of Technology | United States |
Dartmouth College | DSA | https://ror.org/049s0rh22 | Dartmouth College | United States |
Indiana University Bloomington | DSA | https://ror.org/02k40bc56 | Indiana University Bloomington | United States |
Lawrence Berkeley National Laboratory | DSA | https://ror.org/02jbv0t02 | Lawrence Berkeley National Laboratory | United States |
National Institute of Standards and Technology | DSA | https://ror.org/05xpvk416 | National Institute of Standards and Technology | United States |
National Research Institute of Science and Technology for Environment and Agriculture | DSA | https://ror.org/01wep6g48 | National Research Institute of Science and Technology for Environment and Agriculture | France |
Pennsylvania State University | DSA | https://ror.org/04p491231 | Pennsylvania State University | United States |
Smithsonian Institution | DSA | https://ror.org/01pp8nd67 | Smithsonian Institution | United States |
Texas A &M University | DSA | https://ror.org/01f5ytq51 | Texas A&M University | United States |
The University Museum, The University of Tokyo | DSA | https://ror.org/057zh3y96 | University of Tokyo | Japan |
UNAVCO, Inc. | DSA | https://ror.org/02n9tn974 | UNAVCO | United States |
United States Geological Survey | DSA | https://ror.org/035a68863 | United States Geological Survey | United States |
University of Adelaide | DSA | https://ror.org/00892tw58 | University of Adelaide | Australia |
University of Colorado Boulder | DSA | https://ror.org/02ttsq026 | University of Colorado Boulder | United States |
University of Gothenburg | DSA | https://ror.org/01tm6cn81 | University of Gothenburg | Sweden |
University of Illinois at Urbana Champaign | DSA | https://ror.org/047426m28 | University of Illinois at Urbana Champaign | United States |
University of Miami | DSA | https://ror.org/02dgjyy92 | University of Miami | United States |
University of Oxford | DSA | https://ror.org/052gg0110 | University of Oxford | United Kingdom |
University of Washington - Seattle | DSA | https://ror.org/00cvxb145 | University of Washington | United States |
University of Illinois at Urbana-Champaign | DA | https://ror.org/047426m28 | University of Illinois at Urbana Champaign | United States |
Caltech | SA | https://ror.org/05dxps055 | California Institute of Technology | United States |
Commonwealth Fund | SA | https://ror.org/049kzbj92 | Commonwealth Fund | United States |
Technological University Dublin | SA | https://ror.org/04t0qbt32 | Technological University Dublin | Ireland |
The Aerospace Corporation | SA | https://ror.org/01ar9e455 | The Aerospace Corporation | United States |
United Language Group (United States) | SA | https://ror.org/02zf60h18 | United Language Group (United States) | United States |
Unité Mixte Internationnale de Modélisation Mathématique et Informatiques des Systèmes Complèxes | SA | https://ror.org/053kxkj53 | Unité Mixte Internationnale de Modélisation Mathématique et Informatiques des Systèmes Complèxes | France |
American University Center for Social Media | D | https://ror.org/052w4zt36 | American University | United States |
Journal of Early Hearing Detection and Intervention | A | https://ror.org/057xcq908 | Inerventions (Sweden) | Sweden |
OCLC Research | A | https://ror.org/02nv42w72 | Online Computer Library Center | United States |
Technological University for Dublin | A | https://ror.org/04t0qbt32 | Technological University Dublin | Ireland |
University of Colorado - Boulder | A | https://ror.org/02ttsq026 | University of Colorado Boulder | United States |
Affiliations with two RORs
Two RORs were found for three affiliations that are listed in Table 3. Cases like these typically need to be solved with additional information from the complete metadata, the metadata creators, or other sources. The affiliations here illustrate some common factors that can cause uncertainty: acronyms, organizations within organizations, or multiple campuses.
Table 3. Affiliations with two RORs.
Organization | Strat* | ROR | Organization Name | Country |
BGI | S | https://ror.org/045pn2j94 | Beijing Genomics Institute | China |
A | https://ror.org/02act3e13 | BGI (United States) | United States | |
Indiana University School of Medicine | D | https://ror.org/02k40bc56 | Indiana University Bloomington | United States |
A | https://ror.org/05ht4p406 | Indiana University School of Medicine - Lafayette | United States | |
National University of Ireland, Galway | DA | https://ror.org/00shsf120 | National University of Ireland | Ireland |
S | https://ror.org/03bea9k73 | National University of Ireland, Galway | Ireland |
Affiliations with No RORs
Finally, there are eleven affiliations that could not be resolved to RORs shown in Table 4. Some of these are organizations that do not yet have RORs and some are affiliation strings that can not be resolved currently without context or human intervention.
An interesting example of how context and interaction with metadata creators can help is provided by two affiliations in Table 4: “Center for Media and Social Impact” and “Center for Social Media”. Table 1 shows that these are two of the five affiliation strings associated with the American University Library. A google search for “American University Center for Social Media” points to the Center for Media & Social Impact at American University so it appears that all five of the American University Library affiliations shown in Table 1 are actually the same organization, but none of them have the current name from the organization website. The correct ROR in this case (American University - https://ror.org/052w4zt36) was identified for three of the five affiliations (Table 2).
Table 4. Affiliationa with no RORs.
Organizations with No RORs | |
Aridhia Informatics | EPAD |
Audubon Alaska | International Society for the Comparative Study of Civilizations (ISCSC), Board of Directors Member; Pro-Rector, International University for the Societal Development; Washington, the USA |
Caltech Library | mEDRA |
Center for Media and Social Impact | Mysteeriorganisaatio |
Center for Social Media | Penn State University |
CyArk |
Conclusions
The initial question posed here was: are there DataCite members that only need to know a few RORs to populate their entire DataCite metadata collection with organizational identifiers. I took a look at twenty-three DataCite members with five or less affiliations in their metadata. Thirteen of the twenty-three members included (57%) were a complete success – RORs were identified for all of the affiliation strings in the sample metadata. Another five were above 50% and, finally, five had no RORs identified.
Even this small sample of affiliations (49) demonstrated some common challenges along the path to ROR adoption. First, we found a case where a single organization was represented five different ways in one metadata collection. This problem is pervasive in systems built on free-text entry of any information into any data collection. It is one of the central problems that the introduction of identifiers of any kind is trying to address.
Second, we found that affiliations written as unaccompanied acronyms are difficult to resolve unambiguously. It is well known that acronyms need to be spelled out in scientific papers and the same is true when entering affiliations into metadata. Remember that you are writing the affiliation so that the world knows where you work, avoid acronyms, colloquialisms, and jargon!
Finally, several strategies being used to find identifiers from affiliation strings agreed in most of the simple cases. At the same time, for more complex cases, each strategy does well in some and poorly in others. There are always cases where humans, internet searches, and interactions with metadata creators and managers will be needed to get the right answer.