Document text characteristics affect the ranking of the most relevant documents by expanded structured queries

Document

Cited in

Pages	358-376
DOI	https://doi.org/10.1108/EUM0000000007087
Published date	01 June 2001
Date	01 June 2001
Author	Eero Sormunen,Jaana Kekÿlÿinen,Jussi Koivisto,Kalervo Jÿrvelin
Subject Matter	Information & knowledge management,Library & information science

DOCUMENT TEXT CHARACTERISTICS AFFECT THE RANKING OF

THE MOST RELEVANT DOCUMENTS BY EXPANDED

STRUCTURED QUERIES

EERO SORMUNEN, JAANA KEKÄLÄINEN, JUSSI KOIVISTO

and KALERVO JÄRVELIN

{lieeso, lijakr, lijuko, likaja}@uta.ﬁ

Department of Information Studies, University of Tampere

Finland

The increasing ﬂood of documentary information through the

Internet and other information sources challenges the developers of

information retrieval systems. It is not enough that an IR system is

able to make a distinction between relevant and non-relevant

documents. The reduction of information overload requires that IR

systems provide the capability of screening the most valuable

documents out of the mass of potentially or marginally relevant

documents. This paper introduces a new concept-based method to

analyse the text characteristics of documents at varying relevance

levels. The results of the document analysis were applied in an

experiment on query expansion (QE) in a probabilistic IR system.

Statistical differences in textual characteristics of highly relevant and

less relevant documents were investigated by applying a facet

analysis technique. In highly relevant documents a larger number of

aspects of the request were discussed, searchable expressions for the

aspects were distributed over a larger set of text paragraphs, and a

larger set of unique expressions were used per aspect than in

marginally relevant documents. A query expansion experiment

veriﬁed that the ﬁndings of the text analysis can be exploited in

formulating more effective queries for best match retrieval in the

search for highly relevant documents. The results revealed that

expanded queries with concept-based structures performed better

than unexpanded queries or ‘natural language’ queries. Further, it

was shown that highly relevant documents beneﬁt essentially more

from the concept-based QE in ranking than marginally relevant

documents.

1. INTRODUCTION

Fundamental problems of IR experiments are linked to the complex notion of

relevance [1–6]. One of the problems is that in most laboratory experiments

documents are judged either relevant or irrelevant with regard to the request.

Binary relevance cannot reﬂect the possibility that documents may be relevant to

a different degree; some documents contribute more information to the request,

some less without being totally irrelevant. Relevance has been assessed at multi-

ple levels in some studies of operational Boolean systems but even then the lev-

els have been conﬂated into two categories at the analysis phase for the

358358

Journal of Documentation, vol. 57, no. 3, May 2001, pp. 358–376

calculation of precision and recall [e.g. 7–9]. We therefore do not know how dif-

ferent best match IR methods are able to rank documents of varying relevance

levels.

The need for IR methods that are more selective in retrieving highly relevant

documents is quite obvious in large databases like those provided by the Internet

search services [10, 11]. As more documents become available, the number of

potentially relevant items increases. From the user’s viewpoint, the major chal-

lenge for IR systems is not how to differentiate between relevant and non-relevant

documents but rather to separate the highly relevant and potentially relevant doc-

uments. In the evaluation of IR systems, this challenge causes pressure to raise the

threshold for what is accepted as relevant, i.e. what is relevant enough.

One interpretation of the degree of relevance is that highly relevant documents

tend to convey more information about the topic of interest than marginally rele-

vant ones. From this viewpoint one may hypothesise that highly relevant docu-

ments tend to have the following characteristics:

1(a) the topic is discussed in them at length;

(b) they deal with several aspects of the topic;

(d) authors use multiple unique expressions to refer to the concepts they

discuss in order to avoid tautology.

In contrast, marginal documents mention the topic brieﬂy; present just one

aspect or contain just a few words referring to the topic; discuss the topic from a

viewpoint not included in the request; no problem of tautology occurs in them. In

this paper, we test these hypotheses by analysing document text characteristics

(expressions used and concepts referred to) through the facet analysis technique

developed by Sormunen [12, 13].

In best match retrieval, documents are ranked according to scores calculated

from the weights of search keys occurring in documents. These weights are typi-

cally based on the frequency of a key in a document and on the inverse collection

frequency of the documents containing the key (tf.idf weighting) [14]. The devel-

opment of tf.idf weighting schemes has been based on similar statistical hypothe-

ses of document characteristics as were presented above (characteristic 1c).

However, we will emphasise in this paper that the analysis of document texts can

be elaborated, and further that the effectiveness of best match queries can be

improved, especially in retrieving the most valuable documents.

Query structure refers to the syntactic structure of a query expression, marked

with query operators and parentheses. Best match queries may either have a struc-

ture similar to Boolean queries, or queries may be ‘natural language queries’

without differentiated relations between search keys. In the former case, concepts

are identiﬁed (henceforth concept-based or strong structures); in the latter, con-

cepts are not identiﬁed, queries are mere sets of search keys, ‘natural language

queries’ (henceforth weak structures). In the mainstream of experimental IR

research weak query structures are nearly exclusively employed. However, recent

ﬁndings have shown the positive inﬂuence of concept-based query structuring.

For instance, strong query structures improve retrieval performance when queries

are expanded [15, 16]. The positive effect of strong query structures seems to hold

May 2001 DOCUMENT TEXTS

359

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Document text characteristics affect the ranking of the most relevant documents by expanded structured queries

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users