Quality measures for skos. ExactMatch linksets: an application to the thesaurus framework LusTRE

Document

Cited in

DOI	https://doi.org/10.1108/DTA-05-2017-0037
Date	02 July 2018
Published date	02 July 2018
Pages	405-423
Author	Riccardo Albertoni,Monica De Martino,Paola Podestà
Subject Matter	Library & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet

Quality measures for skos

ExactMatch linksets: an application to the

thesaurus framework LusTRE

Riccardo Albertoni, Monica De Martino and Paola Podestà

Istituto di Matematica Applicata e Tecnologie Informatiche, Sezione di Genova,

Consiglio Nazionale delle Ricerche, Genova, Italy

Abstract

Purpose –The purpose of this paper is to focus on the quality of the connections (linkset) among thesauri

published as Linked Data on the Web. It extends the cross-walking measures with two new measures able to

evaluate the enrichment brought by the information reached through the linkset (lexical enrichment,

browsing space enrichment). It fosters the adoption of cross-walking linkset quality measures besides the

well-known and deployed cardinality-based measures (linkset cardinality and linkset coverage).

Design/methodology/approach –The paper applies the linkset measures to the Linked Thesaurus

fRamework for Environment (LusTRE). LusTRE is selected as testbed as it is encoded using a Simple

Knowledge Organisation System (SKOS)published as Linked Data, and it explicitlyexploits the cross-walking

measures on its validated linksets.

Findings –The application on LusTRE offers an insight of the complementarities among the considered

linkset measures. In particular, it shows that the cross-walking measures deepen the cardinality-based

measures analysing quality facets that were not previously considered. The actual value of LusTRE’s linksets

regarding the improvement of multilingualism and concept spaces is assessed.

Research limitations/implications –The paper considers skos:exactMatch linksets, which belong

to a rather specific but a qui te common kind of linkset. The cross-walking meas ures explicitly assume

correctness and compl eteness of linksets. Thi rd party approaches and to ols can help to meet the

above assumptions.

Originality/value –This paper fulfils an identified need to study the quality of linksets. Several approaches

formalise and evaluate Linked Data quality focusing on data set quality but disregarding the other essential

component: the connection among data.

Keywords Quality, SKOS, Linked data, Cross-walking, Environmental thesauri, Linkset

Paper type Research paper

1. Introduction

In the paper “Linked Data –The StorySo Far”,Bizeretal. (2009) were among the firstto take a

picture of the enormous transformation of the Web of Document intothe Web of Data. Since

then, the LinkedData popularity has never ceasedto grow. Linked Data aims at disclosingthe

potential of independently served data dealing with access and integration issues. It not only

publishes documents encoded using the Resource Description Framework (RDF), but also

“uses RDF to make typedstatements that link arbitrary thingsin the world. The result, which

we will referto as the Web of Data, may more accuratelybe described as a web ofthings in the

world, described by dataon the Web”(Bizer et al., 2009). Linked Data allows RDF data to be

published, shared, retrieved, reused and analysed unlocking the existing data silos to a

broader community of consumers.RDF provides a graph-baseddata model based on triplesin

the form of subject,predicate, and object (Schreiber andRaimond, 2014). Both data and links

among data are expressed with triples. Linked Data relies on two fundamental web

technologies: the Internationalised Resource Identifiers (IRIs[1]) and the HyperText

Transfer Protocol (HTTP), which are, respectively, deployed as the global identifiers

and as the protocol to dereference the information that is associated with each identifier. Data Technologies and

Applications

Vol. 52 No. 3, 2018

pp. 405-423

2514-9288

DOI 10.1108/DTA-05-2017-0037

Received 4 May 2017

Revised 28 February 2018

Accepted 27 May 2018

The current issue and full text archive of this journal is available on Emerald Insight at:

www.emeraldinsight.com/2514-9288.htm

This research activity has been carried out within the EU project eENVplus (CIP-ICT-PSP grant

No. 325232).

405

Thesaurus

framework

LusTRE

FollowingLinked Data principles[2], severalbillions of facts encoded in RDF tripleshave been

published in the Linked Open Data (LOD) cloud[3].

This vast quantity of newly available and connected data sets is transforming the web

into a global data space enabling new types of analysis and applications in diverse

domains including life science, government, environment and cultural heritage. At the

same time, the evaluation of the quality of these newly served data becomes critical.

“Data quality can affect the potentiality of the applications that use data. As a

consequence, its inclusion in the data publishing and consumption pipelines is of primary

importance”(Calegari et al., 2017). The challenge is two-fold: to evaluate the quality of the

data on the Web and to make quality-related information explicit, understandable and

consumable to both humans and machines.

Several existing initiatives have the goal to define new metrics and to evaluate the

quality of Linked Data. The W3C Data Quality Vocabulary (DQV) (Albertoni and Isaac,

2016) introduces a common way to document the quality of a data set, making easier to

publish, exchange and consume quality metadata. Recent works such as Zaveri et al.

(2016), Debattista et al. (2016b), Radulovic et al. (2018) and Kontokostas et al. (2014)

consider different aspects of Linked Data quality, called dimensions, e.g., accessibility,

interlinking, performance, syntactic validity or completeness. They define and deploy

several concrete metrics (or measures) to precisely and objectively evaluate each

dimension. However, they focus on Linked Data data sets, reserving very limited attention

to their connections, the linksets. A linkset is a set of homogeneous links, all of the same

types and connecting the same subject data set to the same object data set (Alexander

et al., 2011). The quality of linksets is studied as part of the interlinking dimension defined

in the recent state of the art (Zaveri et al., 2016). Few metrics are defined to evaluate

interlinking, they mainly focus on correctness (e.g. broken links, open owl: sameAs chains,

crowdsourcing method), or on the number of links (linkset cardinality), or on the extent to

which a linkset covers the elements of a data set (linkset coverage) (Guéret et al., 2012;

Zaveri et al., 2016; Albertoni and Gómez Pérez, 2013).

The experience gained creating Linked Thesaurus fRamework for Environment (LusTRE)[4],

the multilingual linked thesaurus framework for the environment, has taught us to pay attention

to the quality of connections between data sets. LusTRE has been designed during the EU

project eENVplus[5] extending and redesigning the Common Thesaurus Framework for the

Environment (De Martino and Albertoni, 2011). LusTRE faces cross-lingual and

cross-sectoral issues in environmental data sharing: it provides a wide multilingual

terminology obtained by linking available thesauri for the different disciplines in the

environment and a set of web services to exploit them (Albertoni et al., 2018). The eENVplus

project has spent considerable efforts to review the available environmental thesauri

checking those not yet available as linked data (Albertoni, De Martino and Podestà, 2014). Then,

it has published ThiST[6] and (Albertoni, De Martino, Di Franco, De Santis and

Plini, 2014) Linked Data using the Simple Knowledge Organisation System (SKOS) (Miles and

Bechhofer, 2009), and connected them to GEMET[7], AGROVOC (Caracciolo et al., 2013)

and EUROVOC[8].

In LusTRE, the linksets among the thesauri are particularly important as they are

exploited to satisfy user requests. LusTRE enriches user navigations and service results

with translations and concepts which are reachable through the linkset. Thus, the linkset

quality becomes a critical issue. Given a linkset between two SKOS thesauri, LusTRE

should evaluate the multilingual enrichment obtained in terms of newly translated labels

reachable through a linkset. This information helps to address the incomplete language

coverage issue, which affects many popular SKOS thesauri (Suominen and Mader, 2014).

It also needs to evaluate the number of new concepts reached by crossing a linkset,

as this helps to assess the enrichment of the space of concepts that can be browsed

406

DTA

52,3

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Quality measures for skos. ExactMatch linksets: an application to the thesaurus framework LusTRE

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users