Author
Webredaktion umwelt.info
Nationales Zentrum für Umwelt- und Naturschutzinformationen / Umweltbundesamt
last updated on:

About the metadata quality

This article explains what the assessment and scores on the metadata mean.

How and why does umwelt.info assess metadata quality?

On umwelt.info there is a metadata quality assessment for each entry. This quality indicator is aimed at the data and information providers, as metadata is important for finding relevant information quickly and systematically. Furthermore, those interested in data can quickly assess whether a search result fulfils certain criteria. Important criteria are for example:

  1. easy reusability through the use of open licences,
  2. an open file format, or
  3. the possibility of automatically downloading a data set

The assessment of metadata quality is based on the four FAIR principles: Findability, Accessibility, Interoperability and Reusability. More about the four FAIR principles can be found here . The individual criteria are either binary (synonymous with yes or no), categorical (synonymous with e.g. unspecific to very specific) or continuous. Table 1 shows an overview of the quality assessment and can also be downloaded as a factsheet (see below).

FAIR principle Criteria Rating System Significance
Findability Identification Yes / No Does an unique identifier exist for the entry?
Title continuous Is the title easy to read according to a readability index?
Description continuous Is the description easy to read according to an readability index?
Key words continuous How many key words match with names in the environmental thesaurus (UMTHES)
Geospatial reference no regional information, general regional name, coordinate-specific region, exact region name, punctual coordinates How precise is the localisation?
time reference Yes / No Is there any date or time range given?
Accessibility Reference Not implemented yet
Direct Access Yes / No Is there a direct link to the original content?
Openly Available Yes / No Is a registration necessary when accessing the data?
Interoperability Machine-readable Data Yes / No Is an automated read-out of at least one resource (data set) possible?
Machine-readable Metadata Yes / No Is an automated read-out of the metadata possible?
Media Type Yes / No Is the data format of at least one resource (data set) known?
Open Data Format Yes / No Is the data format openly accessible (for example .CSV)?
Reusability Licence No information, ambiguous licence, specific licence, specific and open licence Is the licence specific and open?
Contact Yes / No Are contact information given?
Publisher Yes / No Is the publisher known?
Downloads

First Example

An example result illustrates the metadata quality assessment for a groundwater measurement station. In this case, findability has a score of 59. The score for a unique identifier is 0 because the content has no unique identifier. Title and description have 51 and 52 points respectively, which represents a medium and high readability according to the readability index 

Keywords scores 100 points, as four of the five available keywords can be found in the environmental thesaurus (UMTHES). Both spatial reference and temporal reference have 100 points, as the exact geodata of the station and the time period of the measurements are known. Accessibility scores 100 points, as direct access to the content is possible via a link and it is publicly accessible. Interoperability scores 100 points, as three of the four individual criteria are fully met. Machine-readable data is provided for the measuring point in CSV format (100 points), which is an open file format (100 points) and by that the media type is known (100 points). However, no machine-readable metadata is provided (0 points). Reusability is 78 points, as two of the three individual criteria are fully met. A known, open licence is used (33 points), a contact information is available (100 points) and the publisher is known (100 points).

Second example

Another example illustrates the functionality of categorical criteria on pollutant identification in animals from the Palatinate Forest. In contrast to the example 1, the spatial reference criteria only scores 50 points, as only an area (Palatinate Forest) and no punctual geodata are available. The licence scores 33 points, as an unknown licence is specified for the entry. A known but non-free licence would score 66 points, while the absence of a licence would result in 0 points.

Wie hat Ihnen der Beitrag gefallen?

Author
Webredaktion umwelt.info
Nationales Zentrum für Umwelt- und Naturschutzinformationen / Umweltbundesamt