ON THE SPECIFICATION OF TERM VALUES IN AUTOMATIC INDEXING

Document

Cited in

Date	01 April 1973
DOI	https://doi.org/10.1108/eb026562
Pages	351-372
Published date	01 April 1973
Author	G. SALTON,C.S. YANG
Subject Matter	Information & knowledge management,Library & information science

THE

Journal of Documentation

VOLUME 29 NUMBER 4 DECEMBER 1973

ON THE SPECIFICATION OF TERM VALUES

IN AUTOMATIC INDEXING

G. SALTON and C. S. YANG

Department

Computer

Science,

Cornell

University,

Ithaca

The existing practice in automatic indexing is reviewed, and it is shown

that the standard theories for the specification of term

values

(or weights) are

not adequate. New techniques are introduced for the assignment of weights

to index terms, based on the characteristics of individual document collec-

tions.

The effectiveness of some of the proposed methods is evaluated.

I. CURRENT INDEXING PRACTICE

TWO FUNDAMENTAL notions in the theory of automatic indexing are

known respectively as

indexing

exhaustivity and

term

specificity.

Indexing

exhaustivity refers to the accuracy and depth with which the various topic

areas germane to a given document are reflected in the set of index terms

assigned to the document, whereas term specificity is a function of the

exactness with which a term characterizes a given subject. In general,

increasing exhaustivity implies

better recall performance, while increasing

term specificity means better

precision.

In particular, the more exhaustive the

indexing, that is, the more thorough the coverage of the various subject

areas,

the more likely it is that relevant items are actually retrieved in

response to user queries, thus achieving high recall; similarly, the greater

the term specificity, that

is,

the more precise the definition of each term, the

less likely it is that extraneous non-relevant items are also retrieved, thus

achieving high precision. In a given user and collection context, one must

then look for an optimum level of specificity in the vocabulary, and an

optimum level of exhaustivity in the indexing to cover the recall and/or

precision performance desired by the user population.

In an actual operating environment, one may conjecture that indexing

exhaustivity

has

something to do with the number of index terms assigned

351

JOURNAL OF DOCUMENTATION Vol.

29,

no. 4

to a given document, particularly the number of higher frequency terms—

those largely responsible for the recall performance. Term specificity, on

the other hand may be assumed to be related to the number of documents

to which a given term is assigned in a given collection, the idea being that

the smaller the document frequency, that is, the more concentrated the

assignment of

term to only a few documents in a collection, the more

likely it is that a given term is reasonably specific.1

The introduction of relationships between the indexing exhaustivity and

specificity on the one hand, and the frequency characteristics of the index

terms on the other, has led to certain indexing theories which have been

used widely in practice. Before reviewing the main theories, it is conven-

ient to distinguish two different frequency measures. The term

frequency

fki

is the frequency of occurrence of term i in document k. The total frequency

of occurrence, Fi, of term

i is

then defined simply as the sum of the indivi-

dual term frequencies across the N documents of a collection, that is,

A somewhat different measure

the document

frequency

of term

which

measures the number of documents to which term i is assigned. In an

indexing system in which no weights are assigned to the terms, that is,

where f

equal to

for

all

k and all

whenever term

appears in document

k, and

fki

zero otherwise, the document frequency di then equals the total

frequency Fi for all i.

Based on the concepts of term and document frequencies, a large variety

of indexing methods can be implemented using completely objective

criteria which depend only on the occurrence characteristics of terms in

documents. The first and best known of these is due to Luhn, and assumes

that the value, or weight, of a term, assigned to a document is simply

proportional to the term frequency (TF); that is, the more often a term

occurs in the text of a document, the higher its weight.2 The Luhn theory

reflects the fact that high frequency terms are often essential for the specifi-

cation of document content and for the retrieval of relevant information.

In many environments, the standard term frequency weights do, in fact,

enhance the retrieval performance, particularly at the high recall end of the

performance curve, as shown in the example of Figure

for a collection of

425 documents in world affairs taken from issues of Time magazine pub-

lished in 1963, and processed against twenty-four user queries.* It may be

* A recall-precision graph such as that of Figure I is obtained by matching queries and

documents (using a cosine coefficient), and ranking all documents in decreasing order of

query-document similarity. Precision values are then computed at fixed recall levels of

0·1,

0·2, 0·3,

etc,

for each query, and the resulting

values are

averaged for

given query set. When

recall-precision graphs for different indexing or search methods are shown in the same figure,

the curve closest to the upper right-hand corner (where recall and precision are both near 1)

reflects the better performance.3

352

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

ON THE SPECIFICATION OF TERM VALUES IN AUTOMATIC INDEXING

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users