DataCite Bright Spots – Repositories, Consortia, and Improvements

Ted Habermann and Erin Robinson, Metadata Game Changers

During May 2025 we presented results of measurements of DataCite metadata completeness for four FAIR Use Cases: Text, Identifiers, Connections, and Contacts. We identified repositories who are succeeding at creating FAIR metadata despite facing challenges like those faced by the whole community. These bright spots serve as examples that the entire community can learn from to develop effective processes for creating more complete metadata. It is our pleasure to identify these repositories and recognize their accomplishments.

Details of the FAIR Use Cases and the analysis methods are described here. We repeated the overall analysis at the beginning of 2026 to identify emerging bright spots and we extend the bright spot idea to include consortia with more complete metadata across their members and repositories that have improved their metadata during the last six months, highlighting good examples and interesting aspects of the continuing DataCite story.

Overall Bright Spots

Figure 1 shows the completeness combined for all four FAIR use cases of 3414 active DataCite repositories. The DataCite metadata schema includes six mandatory elements that essentially all records contain. Together these elements give FAIR completeness of 12%, so this shows as a clear minimum for all these repositories. Only 2% DataCtie repositories included here are at the absolute minimum level of completeness. The overall average completeness is 24 +- 8% indicated by the dashed line.

Figure 1. Total metadata completeness for all FAIR use cases and 3414 DataCite Repositories. Repositories with > 10,000 records are on the right x-axis. The mandatory fields yield an overall completeness of 12%, the clear minimum for most of these sets. The mean completeness is 24%, shown by the dashed line and two standard deviations above the mean is 40%, shown by the light box. Several repositories outstanding for their size are identified.

The identification of Bright Spots is, of course, arbitrary. The light box in Figure 1 indicates repositories with completeness over 40%, i.e., more than two standard deviations above the mean. This group includes 124 repositories with outstanding metadata completeness. There is a bit of a gap in the data close to 49% (three standard deviations above the mean) and repositories with completeness above this level are listed in Table 1. Regardless of how it is parsed, these repositories are doing amazing work and are clear leaders in the community. The University of Bath remains as the repository with the most complete overall metadata at 61% while the Japanese National Institute for Fusion Science continues to lead the very large repositories (sampled here to 10,000 records) with completeness of 54%.

It is amazing that five of the thirteen repositories in Table 1 maintain outstanding metadata completeness even though they have more than 10,000 records. This accomplishment reflects strong commitments to developing repository processes that support complete metadata and implementing those processes consistently at scale. Great work!

Four repositories joined the list with completeness >= 49% during the last six months. They are marked by bold text in Table 1. Three of these share the same provider: the National Research Council of Italy (rpak). These repositories currently have less than 100 records, but it is difficult to maintain complete metadata as a repository grows unless you start with complete metadata early. Keep up the good work!

In addition to these repositories over 49%, several repositories identified in Figure 1 stand out as very complete in their size range. One of these, is a university repository (University of Leeds, bl.leeds), one a funder (the Natural Environment Research Council, bl.nerc), and one a Federally Funded Research and Development Center (the NSF National Center for Atmospheric Research (ucar.eol). Bright spots exist in many research sectors!

Repository Name
ID
# Records
FAIR Text
FAIR Identifiers
FAIR Connections
FAIR Contacts
Total
University of Bath
bl.bath
824
78%
79%
40%
37%
61%
National Institute for Fusion Science
rpht.nifs
30,000,000
87%
83%
28%
10%
56%
EarthEnv
ktsw.aezvvv
1
80%
67%
33%
30%
54%
BioMemory
rpak.tjjamp
34
80%
78%
21%
31%
54%
RISIS_CNR: Research Infrastructure for Science and Innovation Policy Studies
rpak.hpgbbj
2
80%
72%
25%
30%
53%
StrainInfo (Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures)
zypi.vnulkh
460,822
80%
83%
22%
10%
52%
Grand Ac-célérateur National d’Ions Lourds
inist.ganil
26
72%
60%
35%
38%
52%
Archaeological Map of the Czech Republic
arch.avanjj
203,625
80%
73%
28%
10%
51%
The Chandra Data Archive
si.cda
59,114
87%
39%
42%
30%
50%
Archaeological Map of the Czech Republic
arch.igsn
17,759
82%
67%
28%
10%
50%
ITINERIS HUB: Italian Integrated Environmental Research Infrastructures S
rpak.xfluxo
10
87%
64%
17%
22%
49%
Build Environment Data_Experiments
alhd.bed
51
86%
68%
20%
10%
49%
IT-IOOS: Italian Integrated Ocean Observing System
rpak.rrhuff
10
78%
55%
22%
40%
49%

“Consortia” Bright Spots

A DataCite consortium is a group of like-minded organizations that have come together to collectively participate in DataCite’s community and governance activities and to use DataCite’s DOI services. Some consortia provide mini-communities focused in some region or domain that can work together to improve metadata understanding and processes among members. Bright spots occur in any group, so it is interesting to explore metadata completeness in these consortia.

Table 2 shows forty-six DataCite consortia with ten or more members. These consortia include between 10 and 420 repositories with average repository sizes between 44 and 4739 records.

Consortium Name (ID)
#Mem
Ave # Rec
Min
Max
Ave
TIB DOI Konsortium (tibco)
420
1630
10%
48%
24%
OICC Press (qcqq)
222
47
18%
45%
36%
British Library (blco)
186
1590
6%
61%
24%
French National Centre for Scientific Research (inistco)
181
1474
13%
52%
25%
DataCite Repository Store (txvt)
134
1034
3%
46%
20%
FCT|FCCN (fctco)
102
708
13%
46%
23%
Australian Research Data Commons (ardc)
100
1487
11%
46%
25%
Figshare consortium (fsco)
93
2046
21%
37%
29%
DataCite Canada Consortium (dcan)
92
1535
11%
41%
23%
ETH Zurich (ethzco)
90
2802
12%
47%
20%
Leibniz Institute for the Social Sciences (daraco)
85
821
13%
43%
24%
Conference of Italian University Rectors (CRUI2011) (cruico)
65
2707
12%
48%
22%
Poznan Supercomputing and Networking Center (psncco)
65
858
12%
34%
22%
TU Wien (tuwienco)
62
1265
3%
38%
24%
Hochschulbibliothekszentrum NRW - Consortium Lead (hbzco)
56
1259
13%
43%
24%
MTA Könyvtára (mtakikco)
53
204
10%
36%
16%
ZB MED – Information Centre for Life Science (zbmedco)
52
747
12%
52%
23%
TU Delft Library (delftco)
52
1448
11%
36%
24%
eScire (escireco)
42
430
19%
39%
25%
Svensk nationell datatjänst (sndco)
41
817
12%
47%
29%
Sikt – Norwegian Agency for Shared Services in Education and Research (unitco)
41
322
12%
35%
21%
The Global Dataverse Community Consortium (GDCC) (gdccco)
37
2934
10%
45%
23%
University of Göttingen (subgoeco)
37
1386
13%
34%
21%
Peter the Great Saint-Petersburg Polytechnic University (spbpuco)
29
1964
9%
32%
20%
National Library of Technology (cevl)
28
751
10%
51%
32%
Lyrasis (lyrasis)
27
1995
11%
39%
27%
HEAL-LINK Consortium (hlco)
23
4656
14%
33%
22%
Danish e-Infrastructure Cooperation (DeIC) (deicco)
23
2389
11%
31%
21%
University of Chile (uchile)
21
1037
11%
42%
23%
University of Tartu (estdoico)
21
1147
13%
39%
22%
The National Institutes of Health (NIH) (oaem)
21
957
12%
34%
21%
Virtual Library of Virginia (vivaco)
18
3551
13%
33%
20%
Office of Scientific and Technical Information - Consortium (ostico)
14
3451
16%
34%
23%
Maynooth University (irlco)
14
3616
15%
32%
23%
TIND (tindco)
13
1924
17%
40%
22%
Sages sp. z o. o. (rlfe)
13
126
28%
38%
33%
National Library of New Zealand (doinzco)
13
3201
12%
30%
20%
Conselho Nacional de Desenvolvimento Científico e Tecnológico (cnpq)
12
968
15%
38%
21%
National Aeronautics and Space Administration (nasaco)
11
1089
12%
43%
25%
The Research Activity Identifier (RAiD) (yjtm)
11
44
22%
33%
26%
National University Center for Scientific and Technical Documentation (nsql)
11
4739
17%
28%
22%
California Digital Library (cdlco)
11
1611
12%
21%
16%
University of Vienna (viennaco)
10
1066
11%
43%
26%
Japan Link Center Consortium (jalcco)
10
632
16%
32%
25%
FORS - Swiss Centre of Expertise in the Social Sciences (Consortium) (teyq)
10
312
14%
32%
19%
CSC - IT Center for Science (Finland) (cscco)
10
1276
11%
29%
21%

Figure 2 shows the minimum, average, and maximum total completeness for these consortia. The shaded box shows completeness above 40%, the two standard deviation level from the overall bright spot discussion above. Twenty of these consortia include repositories with completeness over 40%, i.e. overall bright spots in Figure 1. These bright spots can function as good examples and sources of lessons learned for other repositories within the consortia groups that are working to improve their metadata. Recognizing them and taking advantage of their experience is also an important way to encourage their great work!

Figure 1. Minimum, average, and maximum total completeness for DataCite Consortia with ten or more members. The light box shows completeness > 40%, the overall bright spot level shown in Figure 1.

Improvement Bright Spots

While one goal of this effort is recognizing repositories that are creating and sharing more complete metadata overall, an equally important goal is to recognize repositories that are working on improving their metadata on the way to being a bright spot. We now have two completeness measures separated by six months that can be used to identify these improvement bright spots. Table 3 shows the overall completeness scores for the repositories with ten percent or more improvement over this time. These repositories demonstrate that metadata improvement can happen!

Repository Name
ID
5/25
1/26
Change
NSF Seismological Facility for the Advancement of Geoscience (SAGE)
iris.iris
10%
36%
26%
NSF Geodetic Facility for the Advancement of Geoscience (GAGE)
una-vco.unavco
13%
34%
20%
Purdue University Research Repository
pur-due.purduelib
24%
41%
17%
Repositório de Dados de Pesquisa da UFG
ufgbr.dados
21%
38%
17%
DANS Data Stations
dans.archive
16%
31%
15%
NumeRev
inist.mshsud
13%
28%
15%
UNIVE
frnw.fuefjy
18%
32%
14%
NCCR MARVEL
ethz.marvel
17%
31%
14%
AUSSDA - The Austrian Social Science Data Archive
gesis.aussda
17%
30%
13%
DataverseNL
dans.dataversenl
20%
33%
13%
Cellular Senescence Network (SenNet)
psc.sennet
15%
28%
13%
DOE SLAC National Accelerator Laboratory Repos-itory
doe.slac
21%
34%
13%
FID move
wuua.vozxlg
23%
36%
12%
GeoSphere Austria Datahub
gba.datahub
19%
31%
12%
Arcadia Science
anbe.sfiyeb
24%
35%
11%
Akademia Wychowania Fizycznego w Krakowie
awf.kr
21%
32%
11%
GESIS Pretest Database
vzek.pretest
23%
33%
10%
Wisconsin Geological and Natural History Survey
wisc.wgnhs
18%
28%
10%
Computo
sfds.computo
25%
35%
10%
SESAR_Smithsonian
xaqp.sxmooh
15%
25%
10%
Réseau sismologique et géodésique français
inist.resif
32%
41%
10%

The group is led by two repositories that are part of The National Geophysical Facility (NGF), a new, unified U.S. research infrastructure managed by the EarthScope Consortium for the National Science Foundation (NSF). This group took on a major re-curation effort that vaulted them from the minimum possible completeness, i.e., close to 12%, up to ~35%, well above average. The Purdue University Research Repository and the Research Data Repository of the Federal University of Goiás in Brazil also improved their metadata from close to average completeness to bright spot status. Great work!

Conclusion

The DataCite community has grown to over 1700 members worldwide with over 3400 repositories and 100+ million metadata records, an amazing resource for the global research community. The metadata schema has also grown over the years with many new resource types as well as elements for supporting use cases that go way beyond the original identification and citation targets.

In addition to counting members and metadata records, we are interested in recognizing community members that do amazing work creating and improving DataCite metadata in the trenches. We mapped over 60 DataCite metadata elements to four FAIR Use Cases and measured completeness of those elements in over 3000 DataCite repositories with the goal of recognizing repositories with outstanding metadata completeness, termed bright spots.

Bright spots occur in any group at any scale. We identified individual repositories and consortia of repositories with outstanding metadata completeness. We also identified repositories that have made significant improvements to their metadata over the last six months.

All these Bright Spot repositories have overcome obstacles that we all experience and understand and they demonstrate that those obstacles can be overcome. Good luck and keep up the great work. Congratulations to all!

Your Metadata

The “Bright Spot Club” is not exclusive; any repository can join! An interactive notebook is available for exploring completeness and content of your repository on your own. Contact Metadata Game Changers if you have questions or a presentation or workshop on DataCite metadata use cases and completeness would be helpful for your consortium or repository.

Data Availability

The completeness results for all 3414 repositories are available in in Zenodo. They include repository names and ids, the number of records, the completeness for four use cases and the total, and the provider id.