Binary k‐nearest neighbor for text categorization

Document

Cited in

DOI	https://doi.org/10.1108/14684520510617839
Pages	391-399
Published date	01 August 2005
Date	01 August 2005
Author	Songbo Tan
Subject Matter	Information & knowledge management,Library & information science

Binary k-nearest neighbor for

text categorization

Songbo Tan

Software Department, Institute of Computing Technology,

Chinese Academy of Sciences, People’s Republic of China

Abstract

Purpose – With the ever-increasing volume of text data via the internet, it is important that

documents are classiﬁed as manageable and easy to understand categories. This paper proposes the

use of binary k-nearest neighbour (BKNN) for text categorization.

Design/methodology/approach – The paper describes the traditional k-nearest neighbor (KNN)

classiﬁer, introduces BKNN and outlines experiemental results.

Findings – The experimental results indicate that BKNN requires much less CPU time than KNN,

without loss of classiﬁcation performance.

Originality/value – The paper demonstrates how BKNN can be an efﬁcient and effective algorithm

for text categorization. Proposes the use of binary k-nearest neighbor (BKNN ) for text categorization.

Keywords Classiﬁcation,Information retrieval, Data handling

Paper type General review

Introduction

With the ever-increasing volume of text data available via the internet, it is important

that documents are classiﬁed in manageable and easy-to-understand categories. Text

categorization aims to attach predeﬁned labels to previously unseen documents

automatically. This is an active research area in information retrieval, machine

learning and natural language processing. A number of machine learning algorithms

have been introduced to deal with text classiﬁcation: k-nearest neighbor (KNN) (Yang

and Liu, 1999), centroid-based classiﬁer (Han and Karypis, 2000), Naive Bayes (Lewis,

1998), decision trees (Lewis and Ringuette, 1994), and support vector machines (SVM)

(Joachims, 1998).

Of the existing methods, KNN is a simple classiﬁcation algorithm and very easy to

implement since it does not require a training phase. Furthermore, experimental

research shows that KNN offers good performance in most cases (Yang and Liu , 1999;

Yang, 1999). However, because it requires a large amount of CPU time to compute the

similarity between a test document and each training document, and to sort these

similarities, KNN is of low efﬁciency. Such a drawback makes it unsuitable for some

applications where classiﬁcation efﬁciency is crucial, for example, online text

classiﬁcation, in which the classiﬁer has to respond to a lot of documents arriving

simultaneously in stream format.

In order to improve the efﬁciency of KNN, some researchers resort to pruning

training samples (Guan and Zhou, 2002; Zhang and Mani, 2003). The pruning strategy

may perform well in traditional machine learning problems (Mico

´et al., 1994;

Dasarathy et al., 2000), but it can damage the classiﬁcation quality of KNN for text

categorization (Guan and Zhou, 2002). With text classiﬁcation, the documents are

The Emerald Research Register for this journal is available at The current issue and full text archive of this journal is available at

www.emeraldinsight.com/researchregister www.emeraldinsight.com/1468-4527.htm

Text

categorization

391

Refereed article received

13 April 2005

Accepted 12 May 2005

Online Information Review

Vol. 29 No. 4, 2005

pp. 391-399

qEmerald Group Publishing Limited

1468-4527

DOI 10.1108/14684520510617839

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Binary k‐nearest neighbor for text categorization

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users