September 02, 2019

The Metadata Game

September 02, 2019/ Ted Habermann

Join the community playing The Metadata Game!

August 15, 2019

Metadata Archeology: Hunting Affiliations and RORs in DataCite Metadata

August 15, 2019/ Ted Habermann

The DataCite metadata model includes rich information about over 18 million items with DOIs. The schema includes at least four elements that describe organizations: publisher, creator, contributor, and fundingReference. Each of these elements has interesting characteristics. Let’s explore them!

June 19, 2019

Metadata Archeology – Finding Affiliations in Crossref Metadata

June 19, 2019/ Ted Habermann

Identifying good examples of consistent affiliation information providers within the Crossref community and automating the identification of organizations are important steps towards the goal of assigning identifiers (e.g. RORs) to these organizations.

March 25, 2019

The Big Picture - Has CrossRef metadata completeness improved?

March 25, 2019/ Ted Habermann

I recently introduced a simple visualization of data from the CrossRef Participation Reports that provides quantitative insight into how completeness of CrossRef metadata collections with respect to eleven key metadata elements has changed between the backfile and the current record collections. An important goal of that work is to identify members with relatively complete metadata that can provide examples for helping other members understand the benefits of improved metadata. Before we explore and compare individual cases, it is interesting to establish the big picture as a background that we can keep in mind as we look at specifics.

March 01, 2019

Metadata Evolution - Metadata Completeness, Agility, and Collection Size

March 01, 2019/ Ted Habermann

I recently introduced a simple metric for measuring metadata collection completeness with respect to elements in the CrossRef Participation Reports. The suggestion of this metric immediately led to speculation about relationships between collection size and completeness. Small collections include fewer records – are they more likely to be complete? Publishers with large collections have more resources – do they have more complete metadata? Are smaller publishers more agile - can they change more?

March 01, 2019

Metadata for Using and Understanding Software

March 01, 2019/ Ted Habermann

All scientific communities have been linking research together for many years using references to related work in articles. Recently these communities are exploring options for linking to datasets and software. As part of this effort, the CodeMeta Project recently proposed a vocabulary for metadata for code based on schema.org.

February 20, 2019

Metadata Evolution - CrossRef Participation Reports

February 20, 2019/ Ted Habermann

The CrossRef Participation Reports provide a wealth of information about completeness of CrossRef metadata collections and, equally interesting, comparisons between two time periods (backfile and current). These comparisons provide unique opportunities to examine metadata evolution through time. This blog introduces a tool that can be used to visualize this evolution.

February 13, 2019

Dryad Data Packages and Files

February 13, 2019/ Ted Habermann

California Digital Library (CDL) and Dryad recently announced a partnership to address researcher needs and to “move the needle” on digital data adoption throughout the scholarly research community by working together to understand and respond to researcher needs for high-quality scientific data publishing infrastructures. Metadata that supports accessibility to and understanding of published datasets plays a critical role in driving this adoption.

December 06, 2018

Discovery Through Links

December 06, 2018/ Ted Habermann

Metadata records are high-value information hubs that connect users to related resources in journals, books, and on the web.

October 17, 2018

Understanding ISO Metadata - Objects and Properties

October 17, 2018/ Ted Habermann

The Doe family consists of mother Jane and her son John. Identifying people and relationships is an important function of metadata. The ISO TC211 metadata standards provide a comprehensive framework for many important metadata use cases. Could ISO metadata be used to describe the Doe family?

ISO metadata describes objects with types and properties. ISO types generally start with uppercase letters followed by an underscore (MD_, CI_, ...) and have identifiers (id and uuid). Properties are in lower camel case (e.g. contactInfo) and have references (xlink:href and uuidref) that link to objects with content for those properties.

In ISO metadata, Jane and John are objects that have types (CI_Individual) and identifiers (JaneDoe and JohnDoe) which are given as id properties. Jane has a son (the property) whose identifier is JohnDoe, indicated with the reference (xlink:href) to JohnDoe associated with the property of son. John has a mother (the property) named JaneDoe, indicated by the reference to Jane associated with the mother property.

Using identifiers and references simplifies metadata creation and maintenance, improves consistency, and facilitates reuse of people, organizations, or citations referenced using links (like web pages) rather than repetitive content. More about that later.

This blog is included in a series related to ISO TC211 metadata standards. The goal of the series is to improve understanding and to examine and compare how the standards are being used by various groups around the world. Some of the background content was originally available on the NOAA GEO-IDE wiki. It will be updated in this series.

September 17, 2018

Talking and Thinking About Metadata

September 17, 2018/ Ted Habermann

The idea that the language we use to talk about things shapes the way we think or can think about those things has been around since the 1800’s and even has a name, the Sapir–Whorf hypothesis, proposed during 1954. It was Whorf who said, “Language is not simply a reporting device for experience but a defining framework for it.” Last year Lera Boroditsky discussed a similar idea from the stage at TEDWomen with some nice examples and data from multiple languages and cultures. I have been thinking and writing about a universal documentation language for some time and bring together a couple of those ideas here.

Some metadata terms emerged from my metadata evaluation and guidance work with many partners. I described the concept of “metadata dialects” and suggested that many metadata standards are more like dialects of a universal documentation language then they are like separate languages. Some have questioned whether a universal “documentation language” really exists. I admit that it is really a concept that I hope exists rather than a real language described in an unabridged dictionary somewhere.

More recently, I introduced this dialect nomenclature to the Metadata 2020 community of metadata experts that advocate richer, connected, reusable, and open metadata for all research outputs. The terms are slowly creeping into some Metadata 2020 discussions, hopefully helping to build and cross bridges between different communities that are committed to better metadata in all contexts.

September 04, 2018

Documentation or Metadata?

September 04, 2018/ Metadata Game Changer/ Ted Habermann

Many datasets and products are documented using approaches and tools developed by data collectors to support their analysis and understanding. This documentation exists in notebooks, scientific papers, web pages, user guides, word processing documents, spreadsheets, data dictionaries, PDF’s, databases, custom binary and ASCII formats, and almost any other conceivable form, each with associated storage and preservation strategies. This custom, often unstructured, approach may work well for independent investigators or in the confines of a particular laboratory or community, but it makes it difficult for users outside of these small groups to discover, use, and understand the data without consulting with its creators.

Metadata are standard and structured documentation.

Metadata, in contrast to documentation, helps address discovery, use, and understanding by providing well-defined, structured content. This makes it possible for users to access and quickly understand many aspects of datasets that they have not collected. It also makes it possible to integrate information into discovery and analysis tools, and to provide consistent references from the metadata to external documentation.

Metadata standards provide standard element names and associated structures that can describe a wide variety of digital resources. The definitions and domain values are intended to be sufficiently generic to satisfy the metadata needs of various disciplines. These standards also include references to external documentation and well-defined mechanisms for adding structured information to address specific community needs.

Another important difference between documentation and metadata is the target audience. Documentation is targeted at humans and it relies heavily on our capability to make sense out of a variety of unstructured information. Metadata, on the other hand, is typically targeted at applications. Many of these applications facilitate searching metadata and displaying it in a way that facilitates data discovery by humans. As tools mature and, more importantly, the breadth of existing metadata increases, we will see more and more applications creating and using metadata to facilitate more sophisticated metadata and data driven discovery, comparisons between multiple datasets, and other analyses.

Of course, the audience is also very important when we create metadata. Humans like descriptions that help them understand the resources being described and citations to more, likely unstructured, information. Applications are generally much more demanding when it comes to consistency and completeness. It is important to consider both audiences when creating and improving metadata.

Note added: It is interesting to see that the word “documentation” has a much longer history than the word “metadata”. Metadata is really the new kid on the block.

Blog