A decision theoretic approach to combining information filtering

Document

Cited in

Published date	25 September 2009
Date	25 September 2009
DOI	https://doi.org/10.1108/14684520911001918
Pages	920-942
Author	Alexander Binun,Bracha Shapira,Yuval Elovici
Subject Matter	Information & knowledge management,Library & information science

A decision theoretic approach to

combining information ﬁltering

Alexander Binun

Informatics III (INformatik III) Department, University of Bonn, Bonn,

Germany, and

Bracha Shapira and Yuval Elovici

Department of Information Systems Engineering and

Deutsche Telekom Laboratories, Ben-Gurion University, Beer-Sheva, Israel

Abstract

Purpose – The purpose of this paper is to present an extension to a framework based on the

information structure (IS) model for combining information ﬁltering (IF) results. The main goal of the

framework is to combine the results of the different IF systems so as to maximise the expected payoff

(EP) to the user. In this paper we compare three different approaches to tuning the relevance

thresholds of individual IF systems that are being combined in order to maximise the EP to the user. In

the ﬁrst approach we set the same threshold for each of the IF systems. In the second approach the

threshold of each IF system is tuned independently to maximise its own EP (“local optimisation”). In

the third approach the thresholds of the IF systems are jointly tuned to maximise the EP of the

combined system (“global optimisation”).

Design/methodology/approach – An empirical evaluation is conducted to examine the

performance of each approach using two IF systems based on somewhat different ﬁltering

algorithms (TFIDF, OKAPI). Experiments are run using the TREC3, TREC6, and TREC7 test

collections.

Findings – The experiments revealthat, as expected, the third approach alwaysoutperforms the ﬁrst

and the second, andthat for some user proﬁles, the differenceis signiﬁcant. However, operationalgoals

argue against global optimisation, and the costs ofmeeting these operational goals are discussed.

Research limitations/implications – One limitation is the assumption of independence of the IF

systems: in real life systems usually use similar algorithms, so dependency might occur. The approach

also tends to be examined with the assumption of dependency between systems.

Practical implications – The main practical implications of this study lie in the empirical proof that

combination of ﬁltering systems improves ﬁltering results and the ﬁnding about the optimal

combination methods for the different user proﬁles. Many ﬁltering applications exist (e.g. spam ﬁlters,

news personalisation systems, etc.) that can beneﬁt from these ﬁndings.

Originality/value – The study presents and comparesthe contribution of three differentcombination

methods of ﬁlteringsystems to the improvement of ﬁlteringresults It empirically shows the beneﬁts of

each method and draws important conclusions about the combination of ﬁltering systems.

Keywords Information control,Information modelling, Informationretrieval

Paper type Research paper

Introduction

Many informationretrieval (IR) studies have shown that the userbeneﬁt of the output of

a combination of several systems is higher than that of each individual system (Bartell

et al., 1994; Croft, 2000; Fox and Shaw, 1994; Lee, 1997; Saracevic and Kantor, 1988). A

review of system combinations by Croft (2000) identiﬁed the following four approaches:

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1468-4527.htm

OIR

33,5

920

Refereed article received

14 October 2008

Approved for publication

4 May 2009

Online Information Review

Vol. 33 No. 5, 2009

pp. 920-942

qEmerald Group Publishing Limited

1468-4527

DOI 10.1108/14684520911001918

combination of multiple representationsof documents in a single search; combination of

different queriesas additional evidence of the searcher’s information needs; combination

of ranking algorithms; and combination of output from different search systems.

The framework presented by Elovici et al. (2005) performs the last type of

combination i.e. fusing the outputs of two information ﬁltering (IF) systems to

maximise the user beneﬁt by selecting the fusion strategy that would maximise the

user’s expected payoff (EP) of the combined system. The combination strategy is thus

dictated by user preferences (also known as user proﬁles). In this approach, IF sys tems

are treated as information structures (ISs) based on their performance characteristics

(which may be expressed by precision and recall); the optimal combination strategy to

achieve maximal payoff for the user is then derived. However, the observed precision

and recall of each System X depend on its relevance threshold T(that is, the ﬁrst T

documents taken from the top of the output stream of X; they are believed to be

relevant by X). The original framework presented by Elovici et al. (2005) did not

elaborate on how to set this relevance threshold of the combined IF systems in order to

maximise the EP of the user (i.e. tune an IF optimally).

In this paper we extend the IF systems combination framework described by Elovici

et al. (2005) by analysing three approaches to calibrating the threshold of each of the

systems being combined. We clarify the contribution of fusion of output of IF systems

and analyse the effect of the thresholding method on the improvement.

In the ﬁrst combination approach, we set the same threshold for each system and

analyse the combined output for different thresholds. In the second approach, the

threshold of each IF system is tuned independently to maximise its EP. Thus, each of

the systems being combined is optimally tuned before it is combined with the output of

the other systems. By conducting experiments on several TREC collections we show

that the result produced by this approach is often signiﬁcantly worse than the optimal

one. The third approach is based on setting the thresholds of the systems to maximise

the EP of the combined system. We discuss the results of the experiments and suggest

how to improve the second approach, which has operational advantages over the third

approach that provide the best results.

The experiments show that choosing the optimal threshold for each concrete user

proﬁle yields much better results than those achieved when the threshold is constant

for all user proﬁles. We also note that the optimal thresholds strongly depend on the

speciﬁc collection. For example, we found and recorded the optimal thresholds for

TREC3. When these thresholds were applied to the TREC6 test set, the results were

signiﬁcantly worse than the optimal ones.

The rest of the paper is organised as follows: the relevant background is presented

including a brief review of the IS model and its application to IF systems, followed by

an overview of related combination studies. Then we detail the approaches for setting

the relevance threshold parameters. The next section presents the results of evaluating

the performance of each of the threshold setting approaches, and the paper conclu des

with a discussion and suggests future directions.

Background

Review of IR algorithms

Around 1978-1980 several IR engines were already in use. It had appeared that

different retrieval engines return quite dissimilar sets of relevant document sets (Croft

Combining

information

ﬁltering

921

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

A decision theoretic approach to combining information filtering

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users