Business environmental analysis for textual data using data mining and sentence-level classification

Document

Cited in

Pages	69-88
DOI	https://doi.org/10.1108/IMDS-07-2017-0317
Published date	04 February 2019
Date	04 February 2019
Author	Yoon-Sung Kim,Hae-Chang Rim,Do-Gil Lee
Subject Matter	Information & knowledge management,Information systems,Data management systems,Knowledge management,Knowledge sharing,Management science & operations,Supply chain management,Supply chain information systems,Logistics,Quality management/systems

Business environmental analysis

for textual data using data mining

and sentence-level classification

Yoon-Sung Kim and Hae-Chang Rim

Department of Computer Science, Korea University,

Seoul, Korea, and

Do-Gil Lee

Research Institute of Korean Studies, Korea University,

Seoul, Korea

Abstract

Purpose –The purpose of this paper is to propose a methodology to analyze a large amount of unstructured

textual data into categories of business environmental analysis frameworks.

Design/methodology/approach –This paper uses machine learning to classify a vast amount of

unstructured textual data by category of business environmental analysis framework. Generally, it is difficult

to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-

supervised learning techniques are used to improve the classification performance. Additionally, the lack of

feature problem that traditional classification systems have suffered is resolved by applying semantic

features by utilizing word embedding, a new technique in text mining.

Findings –The proposed methodology can be used for various business environmental analyses and the

system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the

problems with insufficient training data. The proposed semantic features can be helpful for improving

traditional classification systems.

Research limitations/implications –This paper focuses on classifying sentences that contain the

information of business environmental analysis in large amount of documents. However, the proposed

methodology has a limitation on the advanced analyses which can directly help managers establish

strategies, since it does not summarize the environmental variables that are implied in the classified

sentences. Using the advanced summarization and recommendation techniques could extract the

environmental variables among the sentences, and they can assist managers to establish effective strategies.

Originality/value –The feature selection technique developed in this paper has not been used in traditional

systems for business and industry, so that the whole process can be fully automated. It also demonstrates

practicality so that it can be applied to various business environmental analysis frameworks. In addition, the

system is more economical than traditional systems because of semi-supervised learning, and can resolve the

lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental

factors and establishing strategies for companies.

Keywords Text mining, SWOT analysis, Machine learning, PEST analysis, Text categorization,

Word embedding

Paper type Research paper

1. Introduction

The ability of a company to analyze and respond to internal and external surroundings in a

rapidly changing social environment is closely related to enterprise competitiveness

(Fleisher and Bensoussan, 2007; Dai et al., 2011). These surroundings are mainly analyzed

using textual data, and the amount of data has been rapidly increasing (Yu et al., 2005; Miao

et al., 2009; Ur-Rahman and Harding, 2012). The information about the internal and external

environments acquired by analyzing textual data can be used to figure out the latest

technology trends, develop new products and establish strategies for responding to

competitors, which ultimately contributes to expanding the competitiveness of the

enterprise ( Johnson et al., 2008). Generally, companies have obtained analytic information

through the following methods: purchasing the information created by a market analyst and

Industrial Management & Data

Systems

Vol. 119 No. 1, 2019

pp. 69-88

0263-5577

DOI 10.1108/IMDS-07-2017-0317

Received 20 July 2017

Revised 17 November 2017

21 December 2017

Accepted 1 January 2018

The current issue and full text archive of this journal is available on Emerald Insight at:

www.emeraldinsight.com/0263-5577.htm

Business

environmental

analysis

directly analyzing textual data by utilizing their own workforce. These methods analyze

information by traditional text mining techniques such as clustering and concept linking

(Bose, 2008).

However, as big data environment arrives, these methods have limitations. First, they are

inefficient in terms of cost. Information obtained by market analysts is so expensive that

most enterprises, excluding conglomerates, cannot realistically use them. Even when

analyzing the data directly, it is difficult for most companies to afford experts for market

analysis. Second, these methods are not effective for processing large amounts of data. The

business environments surrounding enterprises are rapidly changing and the amount of

information is increasing. Traditional methodologies used by market analysts and

techniques for competitive intelligence require the manual works of experts (Bose, 2008).

Therefore, in a big data environment, it is difficult to analyze data using these methods

quickly and comprehensively.

While there have been previous studies on environmental analysis methodology for big data

(Dai et al., 2011; Dai et al., 2013), they have only proposed designs of the system, and did not

actually implement them. Therefore, they did not consider the specific methods and problems

encountered in actual system development, nor did they evaluate the system objectively.

Systems that automatically classify simple problems on a sentence-level rather than

business environmental analysis frameworks have been studied (Samejima et al., 2006;

Ur-Rahman and Harding, 2012; Arif-Uz-Zaman et al., 2016). However, these systems are not

fully automated and do not address the real-world problems such as lack of training data or

lack of features used for classification. To solve these problems, a successful business

environmental analysis methodology should meet the following conditions: it must be able

to retrieve meaningful data in big data environment as much as possible, and all the

processes, such as retrieving meaningful information in massive textual data and

classifying the data into categories, must be performed quickly and be fully automated.

In this paper, we propose a text mining methodology that identifies and automatically

classifies information about the business environment (contained in a large amount of

unstructured textual data) according to the categories of an environmental analysis framework.

We completely automate the process of selecting lexical information in the text as the basis for

the classification, so that the classification can be performed more quickly and economically. In

addition, we solve two technical problems of traditional classification systems using lexical

information. First, we use the semi-supervised learning technique so that we could effectively

train the classifier using less training data. Second, we considered the semantic similarity

between vocabularies by utilizing word embedding, a state-of-the-art text mining techniqu e. We

then implement our proposed system, and evaluate its performance so that we can prove the

practicality of the system. Additionally, we apply our proposed system to PEST and SWOT

analyses that are widely used in business environmental analysis frameworks, and we prove

the scalability of the system through various experiments. As shown in Figure 1, when the

sentences in the documents to be classified are entered into the system, the system classifies

and outputsthe sentences according to the category of theenvironmental analysis framework.

Such classified sentences can be useful for establishing corporate strategies and making

decisions in the future.

The remainder of this paper is organized as follows. In Section 2, we introduce our

business environmental analysis and described the limitations of traditional classification

systems used for business and industry. Section 3 provides the background knowledge

about text mining techniques such as semi-supervised learning and word embeddings.

Section 4 describes a fully automated business environmental system that utilizes data

mining during feature selection process, something for which previous studies has required

experts. We indicate problems such as data sparseness and the high cost of constructing

training data that traditional systems suffer when applying their systems to real

IMDS

119,1

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Business environmental analysis for textual data using data mining and sentence-level classification

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users