FAIR Island Experiments with Connecting Project Resources in DataCite
/Cite this blog as Robinson, E. and Habermann, T. (2023). FAIR Island Experiments with Connecting Project Resources in DataCite. Front Matter. https://doi.org/10.59350/3vatb-fjr76
Erin Robinson and Ted Habermann, Metadata Game Changers
Unique and permanent identifiers (PIDs) for many types of research objects are a growing part of the global research landscape and play a major role enabling connections between many kinds of research objects. Repositories of PID metadata are maintained by several members of the global research infrastructure. Most of the large PID infrastructure providers focus on identifiers for particular types of objects. For example, ORCIDs identify people, RORs identify research organizations, and Crossref DOIs identify papers, book chapters, and related items.
DataCite is different. While it began with the mission of creating DOIs for datasets, since 2009 DataCite has expanded to incorporate a wide variety of resource types including text, physical objects, protocols, images, preprints, and many others (Habermann, 2022). DOIs for instruments and samples have recently been included in DataCite, indicating that this trend towards increased diversity is continuing.
The FAIR Island Project has recently been exploring how the global research infrastructure can be used to connect all kinds of research resources related to scientifically significant places (Thomer, 2022) back to the place that enabled the work (i.e. the field station). To make these connections, we need identifiers for the field stations, DOIs for all research resources, and a metadata record with an identifier to connect everything together. In this blog we describe an experiment using DataCite metadata to make connections to metadata for projects done at the field stations. Using DataCite metadata to identify and connect resources for projects is not new, over 100,000 DataCite records, primarily from the Open Science Foundation and Zenodo, have included “Project” in the free text resourceType fields since 2014.
Using this approach, we can leverage DataCite capabilities to 1) connect project resources (people, data, organizations) with related identifiers, 2) update the metadata records as additional relationships are created and 3) use DataCite Commons to visualize the growing list of resources related to the field station.
Pilot Projects with the ISP Class
In March 2023, we had the opportunity to teach a Data and Communication class as part of the University of California, Berkeley Island Sustainability Program. Through this program, the students did field work, made observations and collected data related to four on-going projects at two field stations: the Gump South Pacific Research Station on Moorea and the Tetiaroa Ecostation on Tetiaroa in French Polynesia.
One of the goals of the class was to simulate the activities that professional research teams need to do before, during and after field work. Thus, we tackled data management planning, collaborative protocol development, and data sharing through open data repositories.
This class provided a real-world opportunity to put the experimental ideas for project metadata into practice and create connections between research artifacts related to the student projects at the field stations.
Identifying People and Their Affiliations
Our first activity with the ISP class was explaining the benefits of unambiguous identification of researchers and identifying all the students with ORCIDs. Each project that the students participated in also includes professional researchers that provided context and guidance, serving as mentors for the students. These people also have ORCIDs and were connected to the projects as creators and mentors.
The students took to their ORCIDs with enthusiasm. In addition to the ORCIDs being linked to the project metadata, the students linked their ORCIDs to the additional artifacts they created, so their ORCID profiles show a rich list of works. Now, several weeks later, the student ORCID profiles have expanded to include these works along with multiple employment and distinction records.
The project records can be seen on the Tetiaroa Society and Gump South Pacific Research Station DataCite Commons pages with appropriate filters (year = 2023, resource type = other). The Creators tab on the project page lists all of the students and mentors on each project with their affiliations when we could find appropriate identifiers (Figure 1).
As mentioned above, all students have ORCIDs, and, in addition, they all have affiliations with RORs. It is these ROR affiliations that result in the Project records being displayed in the DataCite Commons on their organizational affiliation page. In Figure 1 it is University of California, Berkeley.
Identifying Datasets
The students spent time cleaning and documenting the data they collected and then focused on facilitating sharing of the data and related items. They submitted the datasets to Zenodo and received DOIs to identify them. These dataset landing pages (examples below) indicate that all authors have ORCIDs that are connected to the dataset.
Identifying Output Management Plans
Output management plans are an important element of good practice in projects that include collecting or creating datasets and sharing and preserving them. Ideally, these plans are written during the funding stage of the project in collaboration with perspective funders and repositories. In this case, they were created after the collection of the data as part of the learning process.
The California Digital Library DMPTool was used to guide the students through the process of creating the Output Management Plans (examples listed below). These examples show brief descriptions of the plans and include links to the complete plans and DOIs for connecting the plans to datasets and other related outputs.
Identifying Protocols
Having spent time in the field collecting data in sometimes challenging situations, the students understood the importance of well described protocols as guides for the data collection process. They also learned that unexpected situations emerged, even with well laid plans. Also, having spent time cleaning their data and preparing it for sharing, the students identified important tips and warnings that would have helped them improve the quality of their data and metadata early in the life cycle. This learning was reflected in the creation of protocols for data collection.
The students used protocols.io to collaborate on creating detailed descriptions that described what they did and would help others making similar observations in the future (examples listed below). Protocols.io also provided DOIs that could be used to connect the protocols to other research objects.
Connecting the Field Stations
These projects were done at two field stations. Finding ways to acknowledge in-kind contributions by field stations and other organizations is an important goal of the FAIR Island Project. The two field stations make many important in-kind contributions to these projects and acknowledging these contributions is important.
Building on prior work from FAIR Island, both Gump South Pacific Research Station and the Tetiaroa Ecostation have RORs. We also have established including the identifiers in the contributor type “Sponsor” in DataCite metadata records. This allows for connecting the field stations as contributors to the projects (Figure 2).
The Contributors tab in the DataCite Commons pages for these projects list the field station where the work was done as contributors with the type “Sponsor”, indicating in-kind contributions. These connections are shown in Figure 3.
Updating Project Connections and Access
All of the research objects described above were identified using open PIDs and submitted to appropriate repositories (DataCite, DMPHub, and Protocols.io). The DataCite metadata records created for each of the projects serve as containers for collecting these connections for each of the four projects:
Tetiaroa atoll (Motu Onetahi) ant survey January 2023 - ISP Class
Ahuroa Motu Vegetation Sampling After Rat Eradication January 2023 - ISP Class
Species Diversity on Dead or Alive Coral Heads in Moorea, French Polynesia January 2023 - ISP Class
As these resources were created and identified, we connected them through related identifiers using California Digital Library EZID for metadata updates. Figure 4 shows the projects along with all current connections.
The related identifier connections described above are not yet visible in the DataCite Commons, but they are included in the DataCite metadata and are visible by viewing the metadata in JSON or XML. In the JSON they include the DOIs, the relationTypes, and the resourceTypeGeneral of the related research object. For example, this JSON shows the related identifiers for an Output Management Plan, and a protocol (described with the Workflow resourceTypeGeneral).
"relatedIdentifiers": [ { "relationType": "IsDocumentedBy", "relatedIdentifier": "10.48321/d15342", "resourceTypeGeneral": "OutputManagementPlan", "relatedIdentifierType": "DOI" }, { "relationType": "IsDocumentedBy", "relatedIdentifier": "10.17504/protocols.io.ewov1oky2lr2/v1", "resourceTypeGeneral": "Workflow", "relatedIdentifierType": "DOI" } ]
Conclusions
No new infrastructure was required to create these connections. We relied entirely on existing DataCite, ROR, ORCID infrastructure to identify these resources and connect them to the Global Research Infrastructure. This is the power of the global research infrastructure and it is exciting to see the connections being formed and displayed.
There are more connections to make. Real-world scientific projects have papers, samples and more that can be added. Our next step is creating metadata for other projects at both the Gump and Tetiaroa Field Stations and to show these connections. In addition, we are excited to work with LocalContext and create notices from the project metadata using the LocalContext API and then add the LocalContext notices to the project records.
While undergraduate courses are often doing well-established work to teach skills, the work we did with the Island Sustainability Program was a proving ground and the first of its kind. We are grateful to the class for their willingness to explore this new approach to good data management practices including creation, documentation and sharing of data, protocols for data collection, and data management plans and most importantly connected all of that work back to the places they worked - Gump and Tetiaroa.
We are also grateful to these working field stations for their willingness to provide real-world case studies to ground these ideas. Field stations can be considered facilities and we see this as extensible to other facilities that have projects. and to the U.S. National Science Foundation for supporting this work through Award 2132549.