Evaluating the degree of domain specificity of terms in large terminologies. The case of AGROVOC

Date08 June 2015
Pages326-345
DOIhttps://doi.org/10.1108/OIR-02-2015-0052
Published date08 June 2015
AuthorDavid Martín-Moncunill,Miguel-Ángel Sicilia-Urban,Elena García-Barriocanal,Salvador Sánchez-Alonso
Subject MatterLibrary & information science,Information behaviour & retrieval
Evaluating the degree of domain
specificity of terms in large
terminologies
The case of AGROVOC
David Martín-Moncunill, Miguel-Ángel Sicilia-Urban,
Elena García-Barriocanal and Salvador Sánchez-Alonso
Department of Computer Science, University of Alcalá, Madrid, Spain
Abstract
Purpose Large terminologies usually contain a mix of terms that are either generic or domain
specific, which makes the use of the terminology itself a difficult task that may limit the positive effects
of these systems. The purpose of this paper is to systematically evaluate the degree of domain
specificity of the AGROVOC controlled vocabulary terms as a representative of a large terminology in
the agricultural domain and discuss the generic/specific boundaries across its hierarchy.
Design/methodology/approach A user-oriented study with domain-experts in conjunction with
quantitative and systematic analysis. First an in-depth analysis of AGROVOC was carried out to make
a proper selection of terms for the experiment. Then domain-experts were asked to classify the terms
according to their domain specificity. An evaluation was conducted to analyse the domain-experts
results. Finally, the resulting data set was automatically compared with the terms in SUMO, an upper
ontology and MILO, a mid-level ontology; to analyse the coincidences.
Findings Results show the existence of a high number of generic terms. The motivation for several
of the unclear cases is also depicted. The automatic evaluation showed that there is not a direct way to
assess the specificity degree of a term by using SUMO and MILO ontologies, however, it provided
additional validation of the results gathered from the domain-experts.
Research limitations/implications The domain-analysisconcept has long been discussed and
it could be addressed from different perspectives. A resume of these perspectives and an explanation of
the approach followed in this experiment is included in the background section.
Originality/value The authors propose an approach to identify the domain specificity of terms in
large domain-specific terminologies and a criterion to measure the overall domain specificity of a
knowledge organisation system, based on domain-experts analysis. The authors also provide a first
insight about using automated measures to determine the degree to which a given term can be
considered domain specific. The resulting data set from the domain-expertsevaluation can be reused
as a gold standard for further research about these automatic measures.
Keywords Classification, Information retrieval, AGROVOC, Domain specificity,
Knowledge organization systems, Terminologies
Paper type Research paper
Introduction
Large terminologies (Cabré, 2005) established for a particular purpose or domain
are usually organised using a subject-based (Hjørland, 2001) classification. This
classification generates different knowledge organisation systems (KOS), a term
which is intended to encompass all types of schemes for organising information
and promoting knowledge management (Hodge, 2000), including collections such as
thesauri (Aitchison et al., 2000) and ontologies (Sicilia, 2014).
The use of these systems could be valuable for different tasks related to information
retrieval and search purposes, such as annotating and indexing resources using
Online Information Review
Vol. 39 No. 3, 2015
pp. 326-345
©Emerald Group Publishing Limited
1468-4527
DOI 10.1108/OIR-02-2015-0052
Received 17 February 2015
First revision approved
16 March 2015
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1468-4527.htm
326
OIR
39,3
controlled vocabularies (Park et al., 2013), augmenting user queries using ontology-based
query expansion (Segura et al., 2011), or supporting intelligent searches by using formal
semantics (Madhu et al., 2011; Sicilia, 2014) or inferences to ascertain the subject or
aboutnessin the sense described by Hjørland (2001) of a document (Paranjpe, 2009;
Gamon et al., 2013). There are also approaches that consider the use of visual interfaces to
navigate through these systems (García et al., 2014) and use them to formulate queries
(Garcia-Barriocanal and Sicilia, 2003) by selecting terms following the techniques
described by M.J. Bates (1989).
Large digital collections generally use large terminologies and complex KOS, which
complicate the aforementioned tasks. These terminologies usually include a high
number of broad and generic terms which could be applied to other domains and
could prove irrelevant or distracting when searching for or retrieving information.
For example the term technologyis used in the AAT (Art & Architecture Thesaurus),
but the same term can be found in other thesauri relating to completely different
domains such as the MeSH (Medical Subjects Headings) controlled vocabulary and the
agriculture-related controlled vocabulary AGROVOC.
In this paper we report on an experiment which aimed to evaluate the degree
of domain specificity of large collections, aiming to extract relevant information.
In the future this approach may be able to facilitate information retrieval and search
tasks in domain-specific digital collections using large terminologies. More specifically
this information could be useful in refining query expansion techniques by including
the generic/specific aspects of a term as another factor to be considered. It could also
be helpful as a tool to reduce the subject tree browsing complexity ( Julien et al., 2013)
of a KOS.
Domain-specific KOS contain lists of domain-specific and non-domain-specific terms
that could be useful on their own (Kim and Cavedon, 2011). It should be highlighted
that this paper deals with the terminologies upon which domain-specific KOS are
built, not with the particular relationships, classification and categorisation schemes
of the different KOS.
The experiments were conducted using AGROVOC, a controlled vocabulary that
covers all areas of interest to the Food and Agriculture Organisation of the United
Nations. The AGROVOC vocabulary is used to support information retrieval and
search tasks in the large AGRIS (International System for Agricultural Science
and Technology) collection.
The evaluation proposed in this paper is based on a triangulated approach,
including quantitative and systematic analysis in conjunction with human experts.
The results are contrasted in various ways so that the methods complement the
analysis of the study. The starting evaluation method of the experiment followed
a human expertsassessment approach in which experts in the field were asked to
classify AGROVOC terms by stating whether each term was specific to the AGROVOC-
related domains. Prior to this, an in-depth analysis of AGROVOC was needed in order
to select a suitable list of terms for the experiment, which is explained in the Selection
of the materials.
Once an adequate list of terms was selected, experts were asked to classify those
terms and an evaluation was conducted to compare the results and to analyse any
discrepancies. After this, we proposed a criterion to measure the degree of specificity
of terms in AGROVOC, taking into account the particularities of this specific KOS.
This criterion was implemented to obtain a final numerical value and define the
specificity degree of AGROVOC.
327
Domain
specificity of
terms in large
terminologies

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT