Evaluating the concept specialization distance from an end-user perspective. The case of AGROVOC

Date09 October 2017
Pages860-876
Published date09 October 2017
DOIhttps://doi.org/10.1108/OIR-03-2016-0094
AuthorDavid Martín-Moncunill,Miguel Angel Sicilia-Urban,Elena García-Barriocanal,Christian M. Stracke
Subject MatterLibrary & information science,Information behaviour & retrieval,Collection building & management,Bibliometrics,Databases,Information & knowledge management,Information & communications technology,Internet,Records management & preservation,Document management
Evaluating the concept
specialization distance from
an end-user perspective
The case of AGROVOC
David Martín-Moncunill, Miguel Angel Sicilia-Urban and
Elena García-Barriocanal
Department of Computer Science, University of Alcalá, Madrid, Spain, and
Christian M. Stracke
Welten Institute Research Centre for Learning, Teaching and Technology,
Open University of the Netherlands, Heerlen, The Netherlands
Abstract
Purpose The common understanding of generalization/specialization relations assumes the relation to be
equally strong between a classifier and any of its related classifiers and also at every level of the hierarchy.
Assigning a grade of relative distance to represent the level of similarity between the related pairs of
classifiers could correct this situation, which has been considered as an oversimplification of the
psychological account of the real-world relations. The paper aims to discuss these issues.
Design/methodology/approach The evaluation followed an end-user perspective. In order to obtain a
consistent data set of specialization distances, a group of 21 persons was asked to assign values to a set of
relations from a selection of terms from the AGROVOC thesaurus. Then two sets of representations of the
relations between the terms were built, one according to the calculated concept of specialization weights and
the other one following the original order of the thesaurus. In total, 40 persons were asked to choose between
the two sets following an A/B test-like experiment. Finally, short interviews were carried out after the test to
inquiry about their decisions.
Findings The results show that the use of this information could be a valuable tool for search and
information retrieval purposes and for the visual representation of knowledge organization systems (KOS).
Furthermore, the methodology followed in the study turned out to be useful for detecting inconsistencies in
the thesaurus and could thus be used for quality control and optimization of the hierarchical relations.
Originality/value The use of this relative distance information, namely, concept specialization distance,
has been proposed mainly at a theoretical level. In the current experiment, the authors evaluate the potential
use of this information from an end-user perspective, not only for text-based interfaces but alsoits application
for the visual representation of KOS. Finally, the methodology followed for the elaboration of the concept
specialization distance data set showed potential for detecting possible inconsistencies in KOS.
Keywords Information seeking, User interfaces, Knowledge organization systems, Search tactics,
Concept specialization distance, Gen-spec
Paper type Research paper
1. Introduction
Building up knowledge organization systems (KOS) as thesaurus or ontologies is a complex
task that usuallystarts by generating a corpus of terms(Kim and Cavedon, 2011). Then, terms
can be structured by establishing relations among them. In this sense, a KOS is a form of
knowledge representation that aims at organizing the terminology for a particular purpose.
Generalization/specialization (gen-spec) is a relation between classiers (here: terms)
that implies a taxonomic relation and its subsequent inherit semantics. According to
Aitchison et al. (2000), thesauri employ a broader/narrower hierarchy (ISO 25964, 2013)
providing additional information about which terms are broader, which terms are related
and which terms can be used as synonyms. Similarly, ontologies use a is-arelation for
representing hierarchies that are comparable to the above-mentioned relation in thesauri.
Online Information Review
Vol. 41 No. 6, 2017
pp. 860-876
© Emerald PublishingLimited
1468-4527
DOI 10.1108/OIR-03-2016-0094
Received 27 March 2016
Revised 4 May 2017
Accepted 25 July 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1468-4527.htm
860
OIR
41,6
The common understanding of this kind of relations considers them as all-or-nothing,
assuming the relation to be equally strongbetween a classifier and any of its gen-spec-related
classifiers and also at every level of the hierarchy, which has been considered as an
oversimplication of the psychological account of the real-world relations (Cohen and
Murphy, 1984; Sicilia et al., 2003; Cross, 2004; Hu et al., 2007).
Assigning a grade of relative distance to represent the level of similarity between the
related pairs of classifiers could be valuable for search and information retrieval purposes
(Sicilia et al., 2003). In addition, it could be applied for the visual representation of KOS
(Gaona-García et al., 2017). More precisely, regarding information retrieval it can be used to
establish weights for better decision making on the suggestion of related search terms or
related results. In the visual representation area, this can be used to decide on the
representation of the different terms or classifiers and their positions in the screen.
The theory and potential behind the concept specialization distancewas described by
Sicilia et al. (2003), but there is a lack of testing its practical applicability from the end-user
point of view. In this paper, we aim to fill this gap providing first insights on the evaluation
of the concept specialization distance as described by Sicilia et al. (2003), using a selection of
terms from the AGROVOC thesaurus (Leatherdale et al., 1982) which covers all areas
of interest of the Food and Agriculture Organization of the United Nations , including
human nutrition, animal husbandry, forestry, aquatic sciences, fisheries and many aspects
of agriculture.
Since there are no existing KOS containing this information about the relative distance to
represent the level of similarity between pairs of classifiers (namely concept specialization
distance), the first step to proceed with the evaluation was to assign distance values to an
existing KOS. In order to achieve this, we first analyzed AGROVOC to find a suitable sample
for the experiment. Then a group of 21 persons was asked to assign values to the different
relations, aiming to obtain a consistent data set of specialization distances from an end-user
perspective and thus assign definitive weights to the relations.
Once the data set of concept specialization distanceswas ready, the information was
integrated in the KOS in order to measure the impact from the end-user perspective. For this,
two sets of representations for the relations between terms were built, one according to the
calculated concept of specialization weights and the other one following the original order of
the thesaurus (alphabetical). In total, 40 persons were asked to choose between the versions
in an A/B test-like experiment, and short interviews were carried out after the test to inquiry
about their decisions.
The paper is structured as follows. First, we provide a brief background about gen-spec
relations as the approach to assign a grade of relative distance to represent the level of
similarity between the related pairs of classifiers and the expected impact in search, information
retrieval and KOS visualization. This is followed by a description of the selection and
preparation of the materials and the methodology employed. Next, we provide and discuss the
results. Finally, we present our conclusions and look into the opportunities for future research.
2. Background
Agen-specrelation exists between two entities (that are also named as classifiers, classes,
subjects, etc.) if one of the entities evokes a specificity of the other one. The gen-spec
concept is one of the essential concepts on knowledge representation (Fotzo and Gallinari,
2004) and is not only widely used for the construction of KOS (like the is-arelation in
ontologies) but also in fields such as logic, general-purpose object-oriented modeling
notations (Object Management Group, 2013) or programming languages (Norrie et al., 1994).
The gen-specrelation permits to build a hierarchical organization of the concepts that
are present in a corpus of terms. That is useful not only to provide a hierarchical
organization of a collection of documents, but also to facilitate more complex tasks like the
861
The case of
AGROVOC

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT