ROR Identifiers That Have Disappeared

Cite this blog as Habermann, T. (2022). ROR Identifiers That Have Disappeared. Front Matter. https://doi.org/10.59350/7gb9z-x1767

Introduction

One of the key characteristics of Permanent Identifiers (PIDs) is permanence – the ability to persist through time. Technology evolution is an ever-present challenge to permanence with rapid evolution of essentially everything at all levels: back-end storage, data and api representations, indexing and searching, web technologies, and user expectations. In addition, organizations that create and manage identifiers and their priorities and resources evolve, creating more potential for challenges. PID infrastructure providers must navigate these changing landscapes and guarantee persistence of the PIDs they create and manage.

The history of ROR is not independent of these pressures, particularly because the early years of ROR were created, curated, and controlled as the Global Research Identifier (GRID) by Digital Science who supported the ROR concept but were driven by their own set of needs for organizational identifiers.

The GRID database was created in 2015 and started integrating RORs during October 2019. That database formed the basis for RORs over nine releases through September 2019. Table 1 and Figure 1 show the number of RORs included in each GRID release during this time.

An interesting and unexpected aspect of these data is the decrease in the number of RORs during several releases. Of course, if identifiers are permanent, the number of identifiers cannot decrease over time, so this observation needs to be understood and considered as RORs are utilized and as the ROR organization takes over curation of these identifiers.

Date Number of RORs Change Date Number of RORs Change
10/201996,590012/202098,5751259
12/201996,385-2053/202199,509934
3/202096,225-1606/2021100,439930
6/202097,43112069/2021102,3921953
10/202097,316-115

Table 1. Number of RORs / Release

Figure 1. Number of GRID RORs vs. Time

Finding Missing RORs

Fortunately, the GRID data for each release are available, so they can be examined to identify the RORs that are missing during each release. These data were downloaded and RORs that existed during one release but not the next were identified using a query like:

SELECT g.name, first.external_id  
FROM GRIDVersions[first].external_ids first
JOIN GRIDVersions[first]['dbase'].grid g ON g.ID=first.grid_id
LEFT OUTER JOIN GRIDVersions[second]['dbase'].external_ids second
ON first.external_id = second.external_id
WHERE second.external_id IS NULL AND first.external_id_type = "ROR"

In total, 896 RORs that existed in one release and not the next were identified. Most of these disappeared just once, i.e. they were in some number of releases, then disappeared, but some had more complex histories: appearing and disappearing several times. Thirteen of the RORs have reappeared and are in the current database.

A list of the missing RORs along with the release during which they disappeared is available. These RORs are currently not resolvable, that is, i.e. the URLs return a blank page or an error message like {"errors":["ROR ID 'https://ror.org/02yqjxa29' does not exist"]}.

Real-World Impact

The ROR dataset now includes over 102,000 organizations (see Table 1) so, what is the impact of a few missing RORs?

Dryad was identifying RORs and adding them to metadata during the period of the version changes shown in Table 1. These missing RORs remain in those metadata records even though they have since disappeared from the ROR database and no longer resolve. For example, 72 Dryad metadata records include the ROR https://ror.org/03wevmz92 for Harvard Medical School which was recognized and added to the metadata before March, 2021. While this ROR was discontinued by GRID in the 2021-03-25 release, it continues to exist in the metadata even though it can no longer be resolved. Of course, it is unknown how many other unresolvable RORs exist in metadata from Dryad or other repositories.

A Way Forward?

As the ROR community moves forward providing governance and curation of the ROR data, we need to come to consensus on a way forward for these missing RORs. It is important to keep in mind that this is really a broken-link problem and to avoid conflating it with other, more complex, problems. Once the known information for these RORs is added back into the database and they resolve, then they can be included in broader on-going work like providing status information for organizations with RORs.

Note

During the ROR community meeting on Aug. 16, 2022 the ROR Registry announced the addition of a specific error message for these missing RORs to the user interface and the API response. This is a helpful step forward as people and machines that try to resolve these RORs get a meaningful response.

Error message from ror.org for missing RORs