Argumentation-based schema matching for multiple digital libraries

DOIhttps://doi.org/10.1108/OIR-02-2014-0023
Date09 February 2015
Publication Date09 February 2015
Pages81-103
AuthorTho Thanh Quan,Xuan H. Luong,Thanh C. Nguyen,Hui Siu Cheung
SubjectLibrary & information science,Information behaviour & retrieval
Argumentation-based schema
matching for multiple digital
libraries
Tho Thanh Quan
Department of Software Engineering,
Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Xuan H. Luong and Thanh C. Nguyen
Department of Computer Science, Ho Chi Minh City University of Technology,
Ho Chi Minh City, Vietnam, and
Hui Siu Cheung
Department of Computer Engineering, Nanyang Technological University,
Singapore
Abstract
Purpose Most digital libraries (DL) are now available online. They also provide the Z39.50 standard
protocol which allows computer-based systems to effectively retrieve information stored in the DLs.
The major difficulty lies in inconsistency between database schemas of multiple DLs. The purpose of
this paper is to present a system known as Argumentation-based Digital Library Search (ADLSearch),
which facilitates information retrieval across multiple DLs.
Design/methodology/approach The proposed approach is based on argumentation theory for
schema matching reconciliation from multiple schema matching algorithms. In addition, a distributed
architecture is proposed for the ADLSearch system for information retrieval from multiple DLs.
Findings Initial performance results are promising. First, schema matching can improve the
retrieval performance on DLs, as compared to the baseline technique. Subsequently, argumentation-
based retrieval can yield better matching accuracy and retrieval efficiency than individual schema
matching algorithms.
Research limitations/implications The work discussed in this paper has been implemented as a
prototype supporting scholarly retrieval from about 800 DLs over the world. However, due to
complexity of argumentation algorithm, the process of adding new DLs to the system cannot be
performed in a real-time manner.
Originality/value In this paper, an argumentation-based approach is proposed for reconciling the
conflicts from multiple schema matching algorithms in the context of information retrieval from
multiple DL. Moreover, the proposed approach can also be applied for similar applications which
require automatic mapping from multiple database schemas.
Keywords Digital library, Information retrieval, Argumentation, Schema matching
Paper type Research paper
Introduction
Unlike traditional means of storage, digital libraries (DL) (Saracevic and Dalbello, 2001)
are a new kind of library that has emerged since the end of the twentieth century. In DL
information and documents are stored in digital forms which can be accessed and
retrieved over the web. Through the standard protocols such as Z39.50, search engines
can also search information from different DL, and crawlers can connect direc tly to the Online Information Review
Vol. 39 No. 1, 2015
pp. 81-103
©Emerald Group Publis hing Limited
1468-4527
DOI 10.1108/OIR-02-2014-0023
Received4March2014
Second revision approved
10 November 2014
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1468-4527.htm
This work was supported by research project B0212-20-02TD funded by Vietnam National
University Ho Chi Minh City.
81
Schema
matching for
multiple DL
database servers and access the data of the DL. Nowadays DL have become one of the
major sources for researchers when finding scholarly information over the web.
Traditionally DL organise information in database schema. To support information
retrieval from multiple DL it is commonly assumed that the databases of the different
DL would have the same schemas. However, in practice each digital library will have its
own schema. As shown in Figure 1 the same publication record may be represent ed
differently in schemas when stored in different DL.
In Figure 2 we present a closer view of the problem of inconsistent concept
representation in different schemas. When representing the concept of academic
paper, one schema may adopt the term Document while other schemas may use the
term Publication. Some others may even split the concept into two sub-concepts such as
Article and Publisher. It may be easy for humans to understand the similarity between
these terms. However, the inconsistency of terms or keywords used to represent the
same concept poses a serious problem for information retrieval from different sources
of DL. This leads to a well-known research problem called schema matching.
Different algorithms have been proposed for autom atic matching between
schemas. However, as most algorithms rely mainly on heuristics to deal with the
inconsistency of keywords, applying them to different data sets would lead to
different or even conflicting, results (Nguyen et al., 2012). In general each algorithm
works well in certain domains, but its performance suffers when applied to other
domains. Thus for the digital library domain, the difficulty lies in the fact that scholarly
materials stored in DL are from different domains, ranging from social sciences
to natural sciences. Hence to select a suitable one-size-fits-all matching algorithm
is a very challenging task.
In this paper we propose to apply argumentation theory to tackle this problem.
The idea here is that, instead of fixing a certain schema matching algorithm, we can
try multiple matching strategies at the same time. Then if any conflict is found
among the matching results, argumentation theory is applied to infer the most logical
and appropriate answer.
This paper makes two main contributions. First, we propose an
argumentation-based approach to perform schema matching from multiple DL. The
argumentation framework has been published in our previous work (Nguyen et al.,
2013); however, this is the first time it has been applied to the digital library domain.
Moreover, we also improve our argumentation framework to make it fully automatic,
instead of relying on the involvement of human experts. Second, the proposed
approach is then incorporated into a search system for DL, called Argumentation-based
Digital Library Search (ADLSearch). To the best of our knowledge, up to now the
matching between multiple DL has mainly involved manual methods. In contrast
the ADLSearch system is capable of handling more than 800 DL in an automatic
manner due to the integration of our extended argumentation framework.
Related work
Classical schema matching algorithms
Schema matching has been recognised as one of the most important operations
required by the process of data integration, which has been studied by the database
and AI communities for over 25 years (Doan and Halevy, 2005). There are many
cutting-edge schema matching techniques and tools (Bernstein et al., 2011), such as
element-level matching, stru cture-level matching, instanc e-based matching and
82
OIR
39,1

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT