Metadata Game Changers
  • Home
  • Offerings Capabilities Our Team Contact
  • Software
  • Metadata Game
  • Blog
Metadata Game Changers
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/
dataCiteModel_V6.jpg
Metadata Game Changers

Blog

Exploring metadata, communities, and new ideas.

Metadata Game Changers
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/
May 24, 2022

How Many When?

May 24, 2022/ Ted Habermann

Cite this blog as Habermann, T. (2022). How Many When? Front Matter. https://doi.org/10.59350/tbfth-4nv87

In a recent blog I introduced a tool designed to help measure and understand DataCite metadata usage and showed how it could be used to answer a question about usage of DataCite Resource Types introduced in version 4.4 of the schema. Of course it can also be used to retrieve counts of all resource types, old and new: 

retrieveDataCiteFacets --resources  -fl registered --htmlout --csvout -–facetdata

which means: retrieve the registered facet for all resources and output html, csv, and facetdata (a new option explained below).

A few interesting rows of the HTML output from this command are shown here:

DataCite Facet Summary

DataCite Facet Summary

Resources: Audiovisual Book BookChapter Collection ComputationalNotebook ConferencePaper ConferenceProceeding DataPaper Dataset Dissertation Event Image InteractiveResource Journal JournalArticle Model OutputManagementPlan PeerReview PhysicalObject Preprint Report Service Software Sound Standard Text Workflow Other
Facet list: registered
Id DateTime NumberOfRecords registered_number registered_max registered_common registered_total registered_HI registered
Dataset 20220524_10 12040029 10 2675384 2021 11356953 24% 2022 (853124), 2021 (2675384), 2020 (1586853), 2019 (1108041), 2018 (1822208), 2017 (786218), 2016 (838618), 2015 (929426), 2014 (654680), 2013 (102401)
Image 20220524_10 3143275 10 761034 2019 2889591 26% 2022 (146843), 2021 (750629), 2020 (355961), 2019 (761034), 2018 (131244), 2017 (195096), 2016 (336212), 2015 (42940), 2014 (62030), 2013 (107602)
PhysicalObject 20220524_10 1393737 10 564958 2017 1393714 41% 2022 (36375), 2021 (103791), 2020 (271716), 2019 (91488), 2018 (321428), 2017 (564958), 2016 (2137), 2015 (1668), 2014 (79), 2013 (74)
Preprint 20220524_10 932308 5 930922 2022 932297 100% 2022 (930922), 2021 (1305), 2020 (40), 2019 (17), 2018 (13)
Text 20220524_10 9461307 10 1878326 2022 8977826 21% 2022 (1878326), 2021 (1514277), 2020 (1208866), 2019 (1169677), 2018 (882155), 2017 (771246), 2016 (477747), 2015 (409619), 2014 (462754), 2013 (203159)

Report created 20220524_10 by retrieveDataCiteFacets from Metadata Game Changers

This output, which is easier to view in an independent browser, hopefully provides a human-readable summary of the registered facet for all resources, particularly if you know what you are looking for. The bulk of the real data are in the last column (registered) and are not very machine-readable. To help with this problem I added the —facetdata argument. If this argument is on the command line, retrieveDataCiteFacets will output the data from the facet column (registered in this case) in a comma-separated value (CSV) file with the targets, the facet, and the timestamp in the name (e.g. DataCite_resources_registered__20220524_10.html). This file can be read, analyzed, and plotted by your analysis tool of choice.

I read the facet data into Excel to answer two obvious questions. First: how many of each resource type are registered in DataCite. The data in Figure 1 show that Datasets remain the most common resource type in DataCite (40%) but that Text is not far behind (32%).

Figure 1. Counts of DataCite resource types.

How fast is Text catching up? Figure 2 shows that Text (orange) is catching up quickly. In fact, at this point in 2022, the number of Text resources is larger than the number of Datasets (dark blue) for the first time (if you call 2019 a tie). This is particularly interesting because the introduction of many new types last year is expected to dilute the number of Text resources. Note the meteoric rise of Preprints (light blue) which are also slightly ahead of datasets this year. In the last blog I showed that this increase is dominated by the arXiv.common repository who recently decided to take advantage of the global research infrastructure (DataCite) for their metadata. We all benefit from that choice!

Figure 2. Number of resources / year in DataCite (maximum value of 2.7 million for Datasets (blue) during 2021 is cut off to show more detail).

There is clearly a lot to be learned about how repositories use DataCite using the —facetdata option. If —htmlout is in the command line, a simple HTML file with the facet data will also be output. This might be useful for sharing:

DataCite Facet Summary

Resources: Audiovisual Book BookChapter Collection ComputationalNotebook ConferencePaper ConferenceProceeding DataPaper Dataset Dissertation Event Image InteractiveResource Journal JournalArticle Model OutputManagementPlan PeerReview PhysicalObject Preprint Report Service Software Sound Standard Text Workflow Other
Facet list: registered
id 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011
Audiovisual 34377 46232 38130 33613 23062 24969 13553 2496 1682 3815 0 0
Book 4430 6298 618 457 1098 904 623 370 9 22 0 0
BookChapter 2784 7461 64 33 28 100 200 97 0 0 289 5
Collection 46951 104188 90369 76641 65313 76578 71138 43469 178153 6133 0 0
ComputationalNotebook 3 6 0 1 0 0 0 0 0 0 0 0
ConferencePaper 14071 10176 170 69 20 36 8 7 0 69 0 0
ConferenceProceeding 131 144 289 11 10 36 89 28 6 5 0 0
DataPaper 575 2185 7776 1677 142 28 5 6 1 0 0 0
Dataset 853124 2675384 1586853 1108041 1822208 786218 838618 929426 654680 102401 0 0
Dissertation 31880 14607 16200 11 10 16 4 2 5 2 0 0
Event 610 716 706 2518 514 504 587 5846 10 3 0 0
Image 146843 750629 355961 761034 131244 195096 336212 42940 62030 107602 0 0
InteractiveResource 1583 5441 9122 4268 4207 2581 4544 180 30 21 0 0
Journal 116 293 8 1 0 0 0 0 0 0 0 0
JournalArticle 105421 66630 1005 575 231 1949 6404 14 580 0 2 0
Model 288 1518 180 152 1875 77 57 97 16 1379 0 0
OutputManagementPlan 249 524 135 63 0 0 0 0 0 0 0 0
PeerReview 21 270 0 0 0 0 0 0 0 0 0 0
PhysicalObject 36375 103791 271716 91488 321428 564958 2137 1668 79 74 0 0
Preprint 930922 1305 40 17 13 0 0 0 0 0 0 0
Report 6175 10073 4494 5101 5133 6576 8353 5125 3926 4679 0 0
Service 42 42 90 108 19 12 13 1 1 0 0 0
Software 37914 80238 57556 39099 29243 27906 11230 5011 2763 1354 0 0
Sound 761 1877 13586 21991 560 1153 187 35 10 65 0 0
Standard 1337 578 6 213 3 7 2 0 0 0 0 0
Text 1878326 1514277 1208866 1169677 882155 771246 477747 409619 462754 203159 0 0
Workflow 350 291 488 500 479 443 331 541 10 5 0 0
Other 130706 94770 73052 82330 84808 200525 22500 138670 685 436 0 0

Report created 20220524_10 by retrieveDataCiteFacets from Metadata Game Changers

I hope you find this new option useful and look forward to seeing the questions you can answer using DataCiteFacets!

May 24, 2022/ Ted Habermann/ 1 Comment
DataCite Metadata
DataCite, resourceTypes

Ted Habermann

  • How Many When (All Time)
  • DataCite Facets: Understanding ...
  • Home/
  • About Us/
    • Offerings
    • Capabilities
    • Our Team
    • Contact
  • Software/
  • Metadata Game/
  • Blog/

Metadata Game Changers

I have worked in scientific data management for many years and enjoy working with organizations and communities that share data and knowledge. I am fluent in metadata standards and dialects used in scientific data management and publishing.

Tell us what you think!

We are constantly working to help you change your metadata game. If you have any questions, suggestions, or crazy ideas, please send contact us or connect with us through the details below.

Ted Habermann
ted@metadatagamechangers.com
ORCID | LinkedIn | Twitter

Erin Robinson
erin@metadatagamechangers.com
ORCID | LinkedIn | Twitter

or use this form.

Search the site:

Subscribe

Sign up with your email address to receive news and updates.

We respect your privacy.

Thank you!

Powered by Squarespace.