Multi-word terms selection for information retrieval

Document

Cited in

DOI	https://doi.org/10.1108/IDD-12-2021-0142
Published date	28 June 2022
Date	28 June 2022
Pages	74-87
Subject Matter	Library & information science,Library & information services,Lending,Document delivery,Collection building & management,Stock revision,Consortia
Author	Chedi Bechikh Ali,Hatem Haddad,Yahya Slimani

Multi-word terms selection for

information retrieval

Chedi Bechikh Ali

Institut National des Sciences Appliquées et de Technologie (INSAT), LISI, University of Carthage, Tunis, Tunisia

Hatem Haddad

iCompass, Tunis, Tunisia, and

Yahya Slimani

Institut Supérieur des Arts Multimédia (ISAMM), University of Manouba, Manouba, Tunisia

Abstract

Purpose –A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches

suffer from precision inefﬁciency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive

beyond simple terms indexing to propose a framework for multi-word terms (MWT) ﬁltering and indexing.

Design/methodology/approach –In this paper, the authors rely on ranking MWT to ﬁlter them, keeping the most effective ones for the indexing

process. The proposed model is based on ﬁltering MWT according to their ability to capture the document topic and distinguish between different

documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree.

The experiments are carried out with English and French languages data sets.

Findings –The results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced

models based on terms dependencies.

Originality/value –Using and testing different association measures to select MWT that best describe the documents to enhance the precision in

the ﬁrst retrieved documents.

Keywords Performance measurement, Statistics, Information systems, Information retrieval, Information science, Collection management,

Indexing, Multi-word terms, Association measure, Precision

Paper type Research paper

1. Introduction

Existing information retrieval (IR) models are very well deﬁned

from the theoretical point of view and give good results in

evaluation campaigns. However, these models are not efﬁcient at

low recall level. Indeed, some documents can have a low ranking

despite being the most relevant. This can be explained by the fact

that these models do not take the complex grammatical structure

of queries and documents into account. Consequently, there is a

need to use models based on dependencies between terms to

improve the accuracy of IR systems. For classical systems, the

text is considered from a statistical point of view, and no

linguistic, syntactic or dependency information is used. We rely

on the hypothesis that a better understanding of the relationships

and dependencies that may exist between terms in queries and

documents can allow the IR system to perform better. In this

paper, we propose a document indexing model based on multi-

word terms (MWT) extracted using syntactic patterns and

statistical techniques to capture term dependencies. The

syntactic patterns make it possible to eliminate irrelevant

structures when extracting MWT, and the statistical measures

make it possible to ﬁlterMWTbyusingaweighttokeep themost

efﬁcient ones for indexing based on MWT. The fusion of

linguistic and statistical approaches for extracting MWTs shows

itsusefulnessintheterminologyextractionfromdocuments

(Pecina, 2010). It achieved high precision and high mean average

precision (MAP) in collocation extraction when using pointwise

mutual information (MI).

To characterize the extent to which the parts of a MWT are

semantically connected, we choose to use the notion of association

degree (Henry et al.,2018) used for the classiﬁcation of MWT in

key MWT or non-key MWT. A key MWT is used for

representing an important concept in a document or a query.

Indeed, a key MWT can be used to annotate do cuments with

terms that describe best the semantic content (Bendersky and

Croft, 2008). Statistical measures have been proposed to

measure the degree of association, including MI, Dice coefﬁcient

or log-likelihood (Kilgariff, 1992). In corpus linguistics, these

measures are used to identify collocations deﬁnedasgroupsof

terms “that tend to appear in close proximity to one another

signiﬁcantly more often than one might predict based on the

The current issue and full text archiveof this journal is available on Emerald

Insight at: https://www.emerald.com/insight/2398-6247.htm

Information Discovery and Delivery

51/1 (2023) 74–87

[DOI 10.1108/IDD-12-2021-0142]

This research received no speciﬁc grant from any funding agency in the

public, commercial or not-for-proﬁt sectors.

Received 29 December 2021

Revised 22 March 2022

7 May 2022

Accepted 4 June 2022

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Multi-word terms selection for information retrieval

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users