How Many When?
/Cite this blog as Habermann, T. (2022). How Many When? Front Matter. https://doi.org/10.59350/tbfth-4nv87
In a recent blog I introduced a tool designed to help measure and understand DataCite metadata usage and showed how it could be used to answer a question about usage of DataCite Resource Types introduced in version 4.4 of the schema. Of course it can also be used to retrieve counts of all resource types, old and new:
retrieveDataCiteFacets --resources -fl registered --htmlout --csvout -–facetdata
which means: retrieve the registered facet for all resources and output html, csv, and facetdata (a new option explained below).
A few interesting rows of the HTML output from this command are shown here:
DataCite Facet Summary
Resources: Audiovisual Book BookChapter Collection ComputationalNotebook ConferencePaper ConferenceProceeding DataPaper Dataset Dissertation Event Image InteractiveResource Journal JournalArticle Model OutputManagementPlan PeerReview PhysicalObject Preprint Report Service Software Sound Standard Text Workflow OtherFacet list: registered
Id | DateTime | NumberOfRecords | registered_number | registered_max | registered_common | registered_total | registered_HI | registered |
---|---|---|---|---|---|---|---|---|
Dataset | 20220524_10 | 12040029 | 10 | 2675384 | 2021 | 11356953 | 24% | 2022 (853124), 2021 (2675384), 2020 (1586853), 2019 (1108041), 2018 (1822208), 2017 (786218), 2016 (838618), 2015 (929426), 2014 (654680), 2013 (102401) |
Image | 20220524_10 | 3143275 | 10 | 761034 | 2019 | 2889591 | 26% | 2022 (146843), 2021 (750629), 2020 (355961), 2019 (761034), 2018 (131244), 2017 (195096), 2016 (336212), 2015 (42940), 2014 (62030), 2013 (107602) |
PhysicalObject | 20220524_10 | 1393737 | 10 | 564958 | 2017 | 1393714 | 41% | 2022 (36375), 2021 (103791), 2020 (271716), 2019 (91488), 2018 (321428), 2017 (564958), 2016 (2137), 2015 (1668), 2014 (79), 2013 (74) |
Preprint | 20220524_10 | 932308 | 5 | 930922 | 2022 | 932297 | 100% | 2022 (930922), 2021 (1305), 2020 (40), 2019 (17), 2018 (13) |
Text | 20220524_10 | 9461307 | 10 | 1878326 | 2022 | 8977826 | 21% | 2022 (1878326), 2021 (1514277), 2020 (1208866), 2019 (1169677), 2018 (882155), 2017 (771246), 2016 (477747), 2015 (409619), 2014 (462754), 2013 (203159) |
Report created 20220524_10 by retrieveDataCiteFacets from Metadata Game Changers
This output, which is easier to view in an independent browser, hopefully provides a human-readable summary of the registered facet for all resources, particularly if you know what you are looking for. The bulk of the real data are in the last column (registered) and are not very machine-readable. To help with this problem I added the —facetdata argument. If this argument is on the command line, retrieveDataCiteFacets will output the data from the facet column (registered in this case) in a comma-separated value (CSV) file with the targets, the facet, and the timestamp in the name (e.g. DataCite_resources_registered__20220524_10.html). This file can be read, analyzed, and plotted by your analysis tool of choice.
I read the facet data into Excel to answer two obvious questions. First: how many of each resource type are registered in DataCite. The data in Figure 1 show that Datasets remain the most common resource type in DataCite (40%) but that Text is not far behind (32%).
How fast is Text catching up? Figure 2 shows that Text (orange) is catching up quickly. In fact, at this point in 2022, the number of Text resources is larger than the number of Datasets (dark blue) for the first time (if you call 2019 a tie). This is particularly interesting because the introduction of many new types last year is expected to dilute the number of Text resources. Note the meteoric rise of Preprints (light blue) which are also slightly ahead of datasets this year. In the last blog I showed that this increase is dominated by the arXiv.common repository who recently decided to take advantage of the global research infrastructure (DataCite) for their metadata. We all benefit from that choice!
There is clearly a lot to be learned about how repositories use DataCite using the —facetdata option. If —htmlout is in the command line, a simple HTML file with the facet data will also be output. This might be useful for sharing:
DataCite Facet Summary
Resources: Audiovisual Book BookChapter Collection ComputationalNotebook ConferencePaper ConferenceProceeding DataPaper Dataset Dissertation Event Image InteractiveResource Journal JournalArticle Model OutputManagementPlan PeerReview PhysicalObject Preprint Report Service Software Sound Standard Text Workflow OtherFacet list: registered
id | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Audiovisual | 34377 | 46232 | 38130 | 33613 | 23062 | 24969 | 13553 | 2496 | 1682 | 3815 | 0 | 0 |
Book | 4430 | 6298 | 618 | 457 | 1098 | 904 | 623 | 370 | 9 | 22 | 0 | 0 |
BookChapter | 2784 | 7461 | 64 | 33 | 28 | 100 | 200 | 97 | 0 | 0 | 289 | 5 |
Collection | 46951 | 104188 | 90369 | 76641 | 65313 | 76578 | 71138 | 43469 | 178153 | 6133 | 0 | 0 |
ComputationalNotebook | 3 | 6 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ConferencePaper | 14071 | 10176 | 170 | 69 | 20 | 36 | 8 | 7 | 0 | 69 | 0 | 0 |
ConferenceProceeding | 131 | 144 | 289 | 11 | 10 | 36 | 89 | 28 | 6 | 5 | 0 | 0 |
DataPaper | 575 | 2185 | 7776 | 1677 | 142 | 28 | 5 | 6 | 1 | 0 | 0 | 0 |
Dataset | 853124 | 2675384 | 1586853 | 1108041 | 1822208 | 786218 | 838618 | 929426 | 654680 | 102401 | 0 | 0 |
Dissertation | 31880 | 14607 | 16200 | 11 | 10 | 16 | 4 | 2 | 5 | 2 | 0 | 0 |
Event | 610 | 716 | 706 | 2518 | 514 | 504 | 587 | 5846 | 10 | 3 | 0 | 0 |
Image | 146843 | 750629 | 355961 | 761034 | 131244 | 195096 | 336212 | 42940 | 62030 | 107602 | 0 | 0 |
InteractiveResource | 1583 | 5441 | 9122 | 4268 | 4207 | 2581 | 4544 | 180 | 30 | 21 | 0 | 0 |
Journal | 116 | 293 | 8 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
JournalArticle | 105421 | 66630 | 1005 | 575 | 231 | 1949 | 6404 | 14 | 580 | 0 | 2 | 0 |
Model | 288 | 1518 | 180 | 152 | 1875 | 77 | 57 | 97 | 16 | 1379 | 0 | 0 |
OutputManagementPlan | 249 | 524 | 135 | 63 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
PeerReview | 21 | 270 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
PhysicalObject | 36375 | 103791 | 271716 | 91488 | 321428 | 564958 | 2137 | 1668 | 79 | 74 | 0 | 0 |
Preprint | 930922 | 1305 | 40 | 17 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Report | 6175 | 10073 | 4494 | 5101 | 5133 | 6576 | 8353 | 5125 | 3926 | 4679 | 0 | 0 |
Service | 42 | 42 | 90 | 108 | 19 | 12 | 13 | 1 | 1 | 0 | 0 | 0 |
Software | 37914 | 80238 | 57556 | 39099 | 29243 | 27906 | 11230 | 5011 | 2763 | 1354 | 0 | 0 |
Sound | 761 | 1877 | 13586 | 21991 | 560 | 1153 | 187 | 35 | 10 | 65 | 0 | 0 |
Standard | 1337 | 578 | 6 | 213 | 3 | 7 | 2 | 0 | 0 | 0 | 0 | 0 |
Text | 1878326 | 1514277 | 1208866 | 1169677 | 882155 | 771246 | 477747 | 409619 | 462754 | 203159 | 0 | 0 |
Workflow | 350 | 291 | 488 | 500 | 479 | 443 | 331 | 541 | 10 | 5 | 0 | 0 |
Other | 130706 | 94770 | 73052 | 82330 | 84808 | 200525 | 22500 | 138670 | 685 | 436 | 0 | 0 |
Report created 20220524_10 by retrieveDataCiteFacets from Metadata Game Changers
I hope you find this new option useful and look forward to seeing the questions you can answer using DataCiteFacets!