Power of PIDS: Measuring Connectivity

Cite this blog as Habermann, T. (2021). Power of PIDS: Measuring Connectivity. Front Matter. https://doi.org/10.59350/59qrg-1ps18

In honor of PIDapalooza 2021 and 25 hours of Persistent Identifier discussion happening today, we thought a post celebrating the power of PIDs for increasing connectivity and how to measure that connectivity is in order. Let’s party on 🥳

Discovery

Before Conectivity (BC): For many years we have thought about discovery in terms of web portals with text search boxes, maps, timelines, facets, or some other kind of interface.

After Connectivity (AC): Many research objects now have identifiers of various kinds and these identifiers facilitate an entirely new discovery paradigm: finding interesting things because they are connected to things you are interested in.

Identifiers are the lifeblood of these connections.

Mandatory Fields

Before Connectivity (BC): Mandatory metadata fields support the well-worn discovery and indexing paths: titles, author names, publishers, publication dates, spatial and temporal extents… Identifiers are generally optional.

After Connectivity (AC): Identifiers for resources being described and all other items that can have identifiers (related resources, people, organizations, journals, funders, instruments, …) are Mandatory.

Measuring Connectivity

Whether research objects get discovered depends on their connectivity, i.e., the state or extent of being connected or interconnected.

All research objects (and other things in the research world) have identifiers and connections.

Items with high connectivity get discovered.

How can connectivity be measured?

Connectivity can be measured for any item that can have identifiers. In any case it is the number of identifiers divided by the number of possible identifiers. If no identifiers are present, connectivity = 0. If all potential identifiers are present, connectivity = 1.

The first example is for a resource that has two authors. In this case the identifiers are ORCIDS and connectivity can be 0, 0.5, or 1.

Of course the same simple calculation can be applied to the organizations the authors are affiliated with. In this case the identifiers could be GRIDs, RORs, ISNIs, or others and, again, connectivity can be 0, 0.5, or 1.

Collections of research objects (journals, repositories) have many people and organizations (and other things). In these cases, connectivity is a distribution of numbers between 0 and 1., or connectivity can be aggregated over the entire collection and be a single number.

So, connectivity can be measured for journals or repositories and tracked over time.

Before Connections (BC): Connectivity for most research objects, journals and repositories is close to zero.

After Connections (AC): As identifiers are defined and populated in metadata connectivity increases. Connectivity for most research objects and collections approaches one.

Real-World

Connectivity was calculated for ORCIDS and Affiliations (a (weak) proxy for organization identifiers) in ~1500 Journals in a sample of ~5000 metadata records from a large repository.

Most of these journals (~1350) have connectivity = 0, indicating no identifiers or affiliations in the metadata for any articles from those journals.

125 journals have affiliation connectivity = 1, indicating affiliations exist for all authors. If identifiers can be identified for these affiliations, there is potential for connectivity = 1.

Making Improvement Visible

As the number of identifiers in metadata increases through time, the pillars at zero on the left side of the plot will migrate to the right, eventually becoming pillars on the right side of the plot. Measuring connectivity through this process is a great metric for visualizing this evolution and making improvement visible. Measuring metadata is something we do at Metadata Game Changers. If you have metadata and can benefit from measurements, please contact us.