DataCite Bright Spots – Repositories, Consortia, and Improvements
/Ted Habermann and Erin Robinson, Metadata Game Changers
During May 2025 we presented results of measurements of DataCite metadata completeness for four FAIR Use Cases: Text, Identifiers, Connections, and Contacts. We identified repositories who are succeeding at creating FAIR metadata despite facing challenges like those faced by the whole community. These bright spots serve as examples that the entire community can learn from to develop effective processes for creating more complete metadata. It is our pleasure to identify these repositories and recognize their accomplishments.
Details of the FAIR Use Cases and the analysis methods are described here. We repeated the overall analysis at the beginning of 2026 to identify emerging bright spots and we extend the bright spot idea to include consortia with more complete metadata across their members and repositories that have improved their metadata during the last six months, highlighting good examples and interesting aspects of the continuing DataCite story.
Overall Bright Spots
Figure 1 shows the completeness combined for all four FAIR use cases of 3414 active DataCite repositories. The DataCite metadata schema includes six mandatory elements that essentially all records contain. Together these elements give FAIR completeness of 12%, so this shows as a clear minimum for all these repositories. Only 2% DataCtie repositories included here are at the absolute minimum level of completeness. The overall average completeness is 24 +- 8% indicated by the dashed line.
Figure 1. Total metadata completeness for all FAIR use cases and 3414 DataCite Repositories. Repositories with > 10,000 records are on the right x-axis. The mandatory fields yield an overall completeness of 12%, the clear minimum for most of these sets. The mean completeness is 24%, shown by the dashed line and two standard deviations above the mean is 40%, shown by the light box. Several repositories outstanding for their size are identified.
The identification of Bright Spots is, of course, arbitrary. The light box in Figure 1 indicates repositories with completeness over 40%, i.e., more than two standard deviations above the mean. This group includes 124 repositories with outstanding metadata completeness. There is a bit of a gap in the data close to 49% (three standard deviations above the mean) and repositories with completeness above this level are listed in Table 1. Regardless of how it is parsed, these repositories are doing amazing work and are clear leaders in the community. The University of Bath remains as the repository with the most complete overall metadata at 61% while the Japanese National Institute for Fusion Science continues to lead the very large repositories (sampled here to 10,000 records) with completeness of 54%.
It is amazing that five of the thirteen repositories in Table 1 maintain outstanding metadata completeness even though they have more than 10,000 records. This accomplishment reflects strong commitments to developing repository processes that support complete metadata and implementing those processes consistently at scale. Great work!
Four repositories joined the list with completeness >= 49% during the last six months. They are marked by bold text in Table 1. Three of these share the same provider: the National Research Council of Italy (rpak). These repositories currently have less than 100 records, but it is difficult to maintain complete metadata as a repository grows unless you start with complete metadata early. Keep up the good work!
In addition to these repositories over 49%, several repositories identified in Figure 1 stand out as very complete in their size range. One of these, is a university repository (University of Leeds, bl.leeds), one a funder (the Natural Environment Research Council, bl.nerc), and one a Federally Funded Research and Development Center (the NSF National Center for Atmospheric Research (ucar.eol). Bright spots exist in many research sectors!
“Consortia” Bright Spots
A DataCite consortium is a group of like-minded organizations that have come together to collectively participate in DataCite’s community and governance activities and to use DataCite’s DOI services. Some consortia provide mini-communities focused in some region or domain that can work together to improve metadata understanding and processes among members. Bright spots occur in any group, so it is interesting to explore metadata completeness in these consortia.
Table 2 shows forty-six DataCite consortia with ten or more members. These consortia include between 10 and 420 repositories with average repository sizes between 44 and 4739 records.
Figure 2 shows the minimum, average, and maximum total completeness for these consortia. The shaded box shows completeness above 40%, the two standard deviation level from the overall bright spot discussion above. Twenty of these consortia include repositories with completeness over 40%, i.e. overall bright spots in Figure 1. These bright spots can function as good examples and sources of lessons learned for other repositories within the consortia groups that are working to improve their metadata. Recognizing them and taking advantage of their experience is also an important way to encourage their great work!
Figure 1. Minimum, average, and maximum total completeness for DataCite Consortia with ten or more members. The light box shows completeness > 40%, the overall bright spot level shown in Figure 1.
Improvement Bright Spots
While one goal of this effort is recognizing repositories that are creating and sharing more complete metadata overall, an equally important goal is to recognize repositories that are working on improving their metadata on the way to being a bright spot. We now have two completeness measures separated by six months that can be used to identify these improvement bright spots. Table 3 shows the overall completeness scores for the repositories with ten percent or more improvement over this time. These repositories demonstrate that metadata improvement can happen!
The group is led by two repositories that are part of The National Geophysical Facility (NGF), a new, unified U.S. research infrastructure managed by the EarthScope Consortium for the National Science Foundation (NSF). This group took on a major re-curation effort that vaulted them from the minimum possible completeness, i.e., close to 12%, up to ~35%, well above average. The Purdue University Research Repository and the Research Data Repository of the Federal University of Goiás in Brazil also improved their metadata from close to average completeness to bright spot status. Great work!
Conclusion
The DataCite community has grown to over 1700 members worldwide with over 3400 repositories and 100+ million metadata records, an amazing resource for the global research community. The metadata schema has also grown over the years with many new resource types as well as elements for supporting use cases that go way beyond the original identification and citation targets.
In addition to counting members and metadata records, we are interested in recognizing community members that do amazing work creating and improving DataCite metadata in the trenches. We mapped over 60 DataCite metadata elements to four FAIR Use Cases and measured completeness of those elements in over 3000 DataCite repositories with the goal of recognizing repositories with outstanding metadata completeness, termed bright spots.
Bright spots occur in any group at any scale. We identified individual repositories and consortia of repositories with outstanding metadata completeness. We also identified repositories that have made significant improvements to their metadata over the last six months.
All these Bright Spot repositories have overcome obstacles that we all experience and understand and they demonstrate that those obstacles can be overcome. Good luck and keep up the great work. Congratulations to all!
Your Metadata
The “Bright Spot Club” is not exclusive; any repository can join! An interactive notebook is available for exploring completeness and content of your repository on your own. Contact Metadata Game Changers if you have questions or a presentation or workshop on DataCite metadata use cases and completeness would be helpful for your consortium or repository.
Data Availability
The completeness results for all 3414 repositories are available in in Zenodo. They include repository names and ids, the number of records, the completeness for four use cases and the total, and the provider id.
