This article explains what the assessment and scores on the metadata mean.
About the metadata quality
How and why does umwelt.info assess metadata quality?
On umwelt.info there is a metadata quality assessment for each entry. This quality indicator is aimed at the data and information providers, as metadata is important for finding relevant information quickly and systematically. Furthermore, those interested in data can quickly assess whether a search result fulfils certain criteria. Important criteria are for example:
- easy reusability through the use of open licences,
- an open file format, or
- the possibility of automatically downloading a data set
The assessment of metadata quality is based on the four FAIR principles: Findability, Accessibility, Interoperability and Reusability. More about the four FAIR principles can be found here . The individual criteria are either binary (synonymous with yes or no), categorical (synonymous with e.g. unspecific to very specific) or continuous. Table 1 shows an overview of the quality assessment and can also be downloaded as a factsheet (see below).
FAIR principle | Criteria | Rating System | Significance |
---|---|---|---|
Findability | Identification | Yes / No | Does an unique identifier exist for the entry? |
Title | continuous | Is the title easy to read according to a readability index? | |
Description | continuous | Is the description easy to read according to an readability index? | |
Key words | continuous | How many key words match with names in the environmental thesaurus (UMTHES) | |
Geospatial reference | no regional information, general regional name, coordinate-specific region, exact region name, punctual coordinates | How precise is the localisation? | |
time reference | Yes / No | Is there any date or time range given? | |
Accessibility | Reference | Not implemented yet | |
Direct Access | Yes / No | Is there a direct link to the original content? | |
Openly Available | Yes / No | Is a registration necessary when accessing the data? | |
Interoperability | Machine-readable Data | Yes / No | Is an automated read-out of at least one resource (data set) possible? |
Machine-readable Metadata | Yes / No | Is an automated read-out of the metadata possible? | |
Media Type | Yes / No | Is the data format of at least one resource (data set) known? | |
Open Data Format | Yes / No | Is the data format openly accessible (for example .CSV)? | |
Reusability | Licence | No information, ambiguous licence, specific licence, specific and open licence | Is the licence specific and open? |
Contact | Yes / No | Are contact information given? | |
Publisher | Yes / No | Is the publisher known? |
First Example
An example result illustrates the metadata quality assessment for a groundwater measurement station. In this case, findability has a score of 59. The score for a unique identifier is 0 because the content has no unique identifier. Title and description have 51 and 52 points respectively, which represents a medium and high readability according to the readability index
Keywords scores 100 points, as four of the five available keywords can be found in the environmental thesaurus (UMTHES). Both spatial reference and temporal reference have 100 points, as the exact geodata of the station and the time period of the measurements are known. Accessibility scores 100 points, as direct access to the content is possible via a link and it is publicly accessible. Interoperability scores 100 points, as three of the four individual criteria are fully met. Machine-readable data is provided for the measuring point in CSV format (100 points), which is an open file format (100 points) and by that the media type is known (100 points). However, no machine-readable metadata is provided (0 points). Reusability is 78 points, as two of the three individual criteria are fully met. A known, open licence is used (33 points), a contact information is available (100 points) and the publisher is known (100 points).
Second example
Another example illustrates the functionality of categorical criteria on pollutant identification in animals from the Palatinate Forest. In contrast to the example 1, the spatial reference criteria only scores 50 points, as only an area (Palatinate Forest) and no punctual geodata are available. The licence scores 33 points, as an unknown licence is specified for the entry. A known but non-free licence would score 66 points, while the absence of a licence would result in 0 points.