Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Document

Cited authorities 1

Cited in

Date	19 August 2019
Published date	19 August 2019
Pages	154-170
DOI	https://doi.org/10.1108/IDD-09-2018-0045
Author	Janani Balakumar,S. Vijayarani Mohan
Subject Matter	Library & information science,Library & information services,Lending,Document delivery,Collection building & management,Stock revision,Consortia

Artiﬁcial bee colony algorithm for feature

selection and improved support vector machine

for text classiﬁcation

Janani Balakumar and S. Vijayarani Mohan

Department of Computer Science, Bharathiar University, Coimbatore, India

Abstract

Purpose –Owing to the huge volume of documents available on the internet, text classiﬁcation becomes a necessary task to handle these

documents. To achieve optimal text classiﬁcation results, feature selection, an important stage, is use d to curtail the dimensionality of text

documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their

content.

Design/methodology/approach –This paper proposes a new algorithm for feature selection based on artiﬁcial bee colony (ABCFS) to enhance the

text classiﬁcation accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other

existing feature selection approaches such as information gain and

statistic. To justify the efﬁciency of the proposed algorithm, the support

vector machine (SVM) and improved SVM classiﬁer are used in this paper.

Findings –The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were

stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the

performance of the proposed feature selection algorithm by enhancing the text document classiﬁcation accuracy.

Originality/value –This paper proposes a new ABCFS algorithm for feature selection, evaluates the efﬁciency of the ABCFS algorithm and improves

the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is

no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm

will classify the documents automatically based on their content.

Keywords Information technology, Information science, Information retrieval, Information management, Information systems,

Document management, Text classiﬁcation, Feature selection, Information gain,

statistic, Artiﬁcial bee colony, Support vector machine,

Improved SVM

Paper type Research paper

1. Introduction

Text document classiﬁcation is a frequently used technique in

the ﬁeld of text mining and machine learning. It is the process

of assigning a document into one or more predeﬁned

categories. Recently, text classiﬁcation has received increasing

attention from the researchers in the ﬁeld of text mining,

information retrieval (IR), machine learning and artiﬁcial

intelligence (Sebastiani, 2002). The main determination is to

learn the classiﬁer over the instances, hence they can perform

the category of assignment process automatically by using the

machine learning techniques (Leopold and Kindermann,

2002). The main problem of text document classiﬁcation is

that the documents havea set of high-dimensional features that

may reduce the performance of the text classiﬁcation system

(Apté et al.,1994).

Feature selection is the method of selectingthe features from

a set of high-dimensional features. The topmost subset of

features and the minimum amount of features is used to

improve the performance of text classiﬁcationsystem (Aghdam

et al.,2009). This algorithm picks an important set of features

and removes redundant,noisy and irrelevant data. This process

may be used for further classiﬁcation task (Chen et al.,2009).

Commonly, the text can be denoted in two different behaviors,

such as a bag of words and strings.A document denoted as a set

of words with their associated frequency is called a bag of

words. The document that contains a sequence of words is

called as strings. From these types of document

representations, feature selection will select the optimal

number of features. This research work proposesa new feature

selection algorithm based on artiﬁcial bee colony (ABCFS) for

text classiﬁcation. The basic artiﬁcial bee colony (ABC)

technique is used to resolve the optimization problem in the

text classiﬁcationtask, which simulates the foraging behavior of

The current issue and full text archive of this journal is available on

Emerald Insight at: www.emeraldinsight.com/2398-6247.htm

Information Discovery and Delivery

47/3 (2019) 154–170

[DOI 10.1108/IDD-09-2018-0045]

Received 26 September 2018

Revised 8 November 2018

10 December 2018

20 January 2019

11 February 2019

18 February 2019

27 February 2019

11 March 2019

Accepted 21 March 2019

154

a bee colony. This algorithm was proposed by Karaboga and

Ozturk (2010) for continuousoptimization function.

The rest of the paper is organized as follows. Section 2

introduces the work related to ABC, support vector machine

(SVM) and the feature selection methods. Section 3

describes the methods of feature selection, document

classiﬁcation and the proposed algorithm (ABCFS) that is

combined with the improved SVM (ISVM) algorithm.

Section 4 provides the performance measures that were used

to validate the performance of text classiﬁcation task.

Section 5 provides the observations of the experiments

carriedoutonrealandbenchmarkdatasetsaswellasa

comparative analysis for existing feature selection methods

and the proposed methods. Section 6 provides the results of

numerous experiments implemented to illustrate the

effectiveness of the proposed algorithm. Section 7 discusses

the conclusion of this research work.

2. Related works

In the area of text classiﬁcation and feature selection with

optimization, some of the researchersexploit the advanced and

persistent techniques for text document classiﬁcation and

feature selectionfor text classiﬁcation.

Kannan et al. (2006) proposed a document frequency

threshold (DF) technique in thefeature selection phase. In the

classiﬁcation phase, the k-nearest neighbor (kNN) algorithm

with the support vector machines (SVM) algorithm was used.

This study gave a precisionof 0.95 and presented that the kNN

algorithm is suitable for Arabic text classiﬁcation. The

experimentations were performed on collected Arabic

newspaper articlesfrom varieties of newspaper websites that are

available online, including Al-Jazeera,Al-Nahar, Al-Hayat, Al-

Ahram and Al-Dostor. Al-Harbi et al. (2008) have used the

decision tree as a classiﬁer and chi-square statistics (

) for

feature selection. The result of this study is evaluated by

computing the accuracyby dividing the number of the correctly

classiﬁed document by the total number of documents in the

testing data set. The testing and the training dataare based on

Arabic Newswire and Arabic Gigaword corpus. The authors

reported an average accuracy of 0.68 with SVM and0.78 with

C5.0.

Subanya and Rajalaxmi (2014) investigated a new feature

selection based on ABCto recognizethe cardiovascular disease.

They used a benchmark data set taken from the UCI repository

with SVM classiﬁer to facilitate the proposed method. The

result shows that the accuracy is reported as 0.86, and it is

proved to be better than that produced by feature selection

methods based on reverseranking.

Schiezaro and Pedrini (2013) presented a feature selection

method based on the ABC algorithm. The results show that a

reduced feature can accomplishthe best classiﬁcation accuracy.

For some data set, the accuracy has suggestively also the

number of features was reduced. The proposed algorithm

offers the better results when compared with other techniques.

In the future, they were suggested to develop the ﬁlter approach

by combining the ABC algorithm, entropy and mutual

information. Karaboga and Ozturk (2010) have proposed an

ABC algorithm, and it was tested on fuzzy clustering for

classifyingthe different data sets. Different benchmark data sets

were collected from the UCI repository. The result shows that

the performance of ABC optimizationalgorithm was successful

by attaining the lower classiﬁcation error percentage of 16.32

per cent. The proposed ABC fuzzy clusteringachieves 8.09 per

cent less classiﬁcationerror when compared to fuzzy c-means.

An accelerated ABC (A-ABC) method was proposed with

two modiﬁcations, and it was implemented on the ABC

algorithm to conﬁrm the local search capability and

convergence speed. The modiﬁcations were called

modiﬁcation rate (MR) and step size (SS). The result shows

that the performance of A-ABC is good and convergencefaster

than the ABC algorithm’s standard version. This method was

compared with standard ABC along with the seven different

benchmark functions to validate the effects of using MR value

and SS modiﬁcation (Ozkis and Babalik, 2014). O’Keefe and

Koprinska (2009) proposed the feature selectors and feature

weights with Naive Bayes and SVM classiﬁers.In this work, the

authors have used two newfeature selection methods and three

feature weighting methods. Sentiment analysis recognizes

whether opinion in a document is positive/negative based on a

topic. The experimental results show that it was possible to

maintain an 87.15 per cent, state-of-the-art classiﬁcation

accuracy whenusing less than 36 per cent of features.

Ghany et al. (2015) proposed a binary algorithm for feature

selection. To check the effectiveness of this method, they used

several benchmarkdata sets and compared them with two well-

known bio-inspired methods [genetic algorithm (GA) and

particle swarm optimization (PSO)]. The results showed that

the proposed binary algorithm outperformed GA and PSO in

improving the classiﬁcation performance and reducing the

feature set. They reported a classiﬁcation error between 0.024

and 0.297. Younus et al. (2015) proposed a new PSO method

for feature selection to handle Arabic text summarization. The

proposed method was tested and compared with ﬁve existing

works. They have describeda precision of 0.67 that was not the

best result compared to the results of the existing methods.

They recommended to improve the proposed PSO and to

investigate new swarm intelligent techniques such as

evolutionarystrategy.

Chandrashekar and Sahin (2014) presented different types

of feature selection methodsfor a high-dimensional data set. In

this paper, they provided an outline of some of the feature

selection techniques. The main objective of their research

paper is to deliver a standard introduction to variable

elimination that can be applied to a wide array of machine

learning problems. Especially, the authors were focused on

ﬁlter, wrapper and embedded methods. They also applied the

feature selection techniques on the standard data set to

establish the efﬁciencyof feature selection techniques.

Liu and Yu (2005) presented the feature selection concepts

and algorithms., The study also reviews the existing feature

selection algorithms for classiﬁcation and clustering, groups

and compares different algorithms with a categorizing

framework based on search strategies, evaluation criteria and

data mining tasks, reveals unattempt combinations and

provides guidelines in selecting featureselection algorithms. In

the categorization task, the authors were buildingan integrated

system for intelligent feature selection. A uniting platform was

proposed as an intermediate step. They used some of the real-

Artiﬁcial bee colony algorithm

Janani Balakumar and S. Vijayarani Mohan

Information Discovery and Delivery

Volume 47 · Number 3 · 2019 · 154–170

155

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users