Can Communities Improve Metadata?

Cite this blog as Habermann, T. (2021). Can Communities Improve Metadata? Front Matter. https://doi.org/10.59350/vtt2s-jss23

With Chris Beltz, S. Jeanette Clark, Peter Slaughter and Matt Jones from the Arctic Data Center (https://ror.org/055hrh286) and NCEAS (https://ror.org/0146z4r19).

I have written several blogs about measuring metadata with goals of identifying good examples and quantifying improvements in metadata over time. Most of these have focused on repository collections and the role of repositories in improving metadata. Of course, improving metadata is really a partnership between repositories, data providers, and journals that create metadata and all of the players can make significant contributions. Can these contributions be measured? 

Many of the metadata measurement tools and approaches I have been using were developed during the NSF funded MetaDIG Project and the NSF Arctic Data Center has been using these techniques since 2016. Results of this on-going work were presented recently in a session titled Progress Towards Realizing Data Sharing for the Arctic Region and Beyond at the Arctic Science Summit Week and shed some light on community metadata improvements.

Figure 1 shows how metadata metrics related to the FAIR use cases have changed since the Arctic Data Center began using the metrics in the data ingest process. Metric values are shown on initial submission (dots) and on final publication (solid).

Figure 1. Metadata Metrics for the FAIR Principles in Arctic Data Center metadata on submission (dots) and publication (solid) as a function of time. The brackets show improvements made by the community (red) and the repository (black).

Figure 1. Metadata Metrics for the FAIR Principles in Arctic Data Center metadata on submission (dots) and publication (solid) as a function of time. The brackets show improvements made by the community (red) and the repository (black).

Metrics for all of the use cases show improvements over time which is great news, but the metrics for interoperability and reusability are particularly interesting. The brackets on the right side of the frames show the amount of metadata improvement achieved during the curation process (black) and prior to submission (red). Note that the amounts of improvement are similar to one another. In other words, during the five years of application of these metrics in an on-going partnership between the repository and the data providers, the metadata for submitted datasets have improved considerably before the datasets are submitted, i.e. the community is improving metadata prior to submission.

As usual, there are details that need to be examined and elucidated, but the initial conclusion is certainly heartening: Metadata metrics and improvement efforts in repositories can flow upstream and affect communities! The next step is applying the metrics to the entire DataOne network of repositories to find other god examples of metadata that support all of the FAIR use cases.