Structure‐preserving and query‐biased document summarisation for web searching

Document

Cited in

Published date	07 August 2009
Pages	696-719
DOI	https://doi.org/10.1108/14684520910985684
Date	07 August 2009
Author	F. Canan Pembe,Tunga Güngör
Subject Matter	Information & knowledge management,Library & information science

Structure-preserving and

query-biased document

summarisation for web searching

F. Canan Pembe

Department of Computer Engineering,

Bog

˘azic¸i University, Istanbul, Turkey and Department of Computer Engineering,

˙stanbul Ku

¨ltu

¨r University, Istanbul, Turkey, and

Tunga Gu

¨ngo

¨r

Department of Computer Engineering, Bog

˘azic¸i University, Istanbul, Turkey

Abstract

Purpose – The purpose of this paper is to develop a new summarisation approach, namely

structure-preserving and query-biased summarisation, to improve the effectiveness of web searching.

During web searching, one aid for users is the document summaries provided in the search results.

However, the summaries provided by current search engines have limitations in directing users to

relevant documents.

Design/methodology/approach – The proposed syste m consists of two stages: document

structure analysis and summarisation. In the ﬁrst stage, a rule-based approach is used to identify

the sectional hierarchies of web documents. In the second stage, query-biased summaries are created,

making use of document structure both in the summarisation process and in the output summaries.

Findings – In structural processing, about 70 per cent accuracy in identifying document sectional

hierarchies is obtained. The summarisation method is tested on a task-based evaluation method using

English and Turkish document collections. The results show that the proposed method is a signiﬁcant

improvement over both unstructured query-biased summaries and Google snippets in terms of

f-measure.

Practical implications – The proposed summarisation system can be incorporated into search

engines. The structural processing technique also has applications in other information systems, such

as browsing, outlining and indexing documents.

Originality/value – In the literature on summarisation, the effects of query-biased techniques and

document structure are considered in only a few works and are researched separately. The research

reported here differs from traditional approaches by combining these two aspects in a coherent

framework. The work is also the ﬁrst automatic summarisation study for Turkish targeting web

search.

Keywords Data structures,Document delivery, Markup languages,Search engines, Worldwide web

Paper type Research paper

Introduction

The drastic increase in documents available on the world wide web has resulted in the

wide-spread problem of information overload (Mani and Maybury, 1999). People now

have access to vast amounts of information; however, it is becoming increasingly

difﬁcult to locate useful information. Search engines usually return a large number of

results in response to user queries. One study of European users showed that about 50

per cent of documents viewed by users are irrelevant (Jansen and Spink, 2005). Users

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1468-4527.htm

OIR

33,4

696

Refereed article received

19 July 2008

Approved for publication

20 January 2009

Online Information Review

Vol. 33 No. 4, 2009

pp. 696-719

qEmerald Group Publishing Limited

1468-4527

DOI 10.1108/14684520910985684

need to open several links to ﬁnd the desired information, especially for speciﬁc and

complex queries (e.g. best retirement countries) and for tasks such as background

searching rather than queries with commonplace answers (e.g. capital city of Sweden).

In currently available search engines, such as Google and Altavista, each link in the

results is associated with a short summary (e.g. a two-line extract) of its content.

Although such extracts show some of the document fragments containing the query

words, they fail to reveal their context within the document. As a result, the user either

misses relevant results or spends time on irrelevant ones. Figure 1 shows the ﬁrst six

results of Google in response to the TREC-2004[1] query “antibiotics bacteria disease”.

In that task, the aim of the user is to ﬁnd documents that discuss how and why

antibiotics become ineffective for some bacteria types. When we analyse the related

documents, we see that only half of the extracts in the ﬁgure effectively direct the

users.

At this point, automatic summarisation techniques gain importance. Although

creating summaries as successful as human summaries is still a long-term research

direction, summaries that are not perfect can be utilised to improve the effectiveness of

other tasks such as information retrieval (Sparck-Jones, 1999). Automatic

summarisation research has traditionally focused on creating general-purpose

summaries. However, in an information retrieval paradigm, it has become important

Figure 1.

First few outputs of

Google search engine for

an example query

Web searching

697

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Structure‐preserving and query‐biased document summarisation for web searching

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users