Business environmental analysis for textual data using data mining and sentence-level classification

Pages69-88
DOIhttps://doi.org/10.1108/IMDS-07-2017-0317
Published date04 February 2019
Date04 February 2019
AuthorYoon-Sung Kim,Hae-Chang Rim,Do-Gil Lee
Subject MatterInformation & knowledge management,Information systems,Data management systems,Knowledge management,Knowledge sharing,Management science & operations,Supply chain management,Supply chain information systems,Logistics,Quality management/systems
Business environmental analysis
for textual data using data mining
and sentence-level classification
Yoon-Sung Kim and Hae-Chang Rim
Department of Computer Science, Korea University,
Seoul, Korea, and
Do-Gil Lee
Research Institute of Korean Studies, Korea University,
Seoul, Korea
Abstract
Purpose The purpose of this paper is to propose a methodology to analyze a large amount of unstructured
textual data into categories of business environmental analysis frameworks.
Design/methodology/approach This paper uses machine learning to classify a vast amount of
unstructured textual data by category of business environmental analysis framework. Generally, it is difficult
to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-
supervised learning techniques are used to improve the classification performance. Additionally, the lack of
feature problem that traditional classification systems have suffered is resolved by applying semantic
features by utilizing word embedding, a new technique in text mining.
Findings The proposed methodology can be used for various business environmental analyses and the
system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the
problems with insufficient training data. The proposed semantic features can be helpful for improving
traditional classification systems.
Research limitations/implications This paper focuses on classifying sentences that contain the
information of business environmental analysis in large amount of documents. However, the proposed
methodology has a limitation on the advanced analyses which can directly help managers establish
strategies, since it does not summarize the environmental variables that are implied in the classified
sentences. Using the advanced summarization and recommendation techniques could extract the
environmental variables among the sentences, and they can assist managers to establish effective strategies.
Originality/value The feature selection technique developed in this paper has not been used in traditional
systems for business and industry, so that the whole process can be fully automated. It also demonstrates
practicality so that it can be applied to various business environmental analysis frameworks. In addition, the
system is more economical than traditional systems because of semi-supervised learning, and can resolve the
lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental
factors and establishing strategies for companies.
Keywords Text mining, SWOT analysis, Machine learning, PEST analysis, Text categorization,
Word embedding
Paper type Research paper
1. Introduction
The ability of a company to analyze and respond to internal and external surroundings in a
rapidly changing social environment is closely related to enterprise competitiveness
(Fleisher and Bensoussan, 2007; Dai et al., 2011). These surroundings are mainly analyzed
using textual data, and the amount of data has been rapidly increasing (Yu et al., 2005; Miao
et al., 2009; Ur-Rahman and Harding, 2012). The information about the internal and external
environments acquired by analyzing textual data can be used to figure out the latest
technology trends, develop new products and establish strategies for responding to
competitors, which ultimately contributes to expanding the competitiveness of the
enterprise ( Johnson et al., 2008). Generally, companies have obtained analytic information
through the following methods: purchasing the information created by a market analyst and
Industrial Management & Data
Systems
Vol. 119 No. 1, 2019
pp. 69-88
© Emerald PublishingLimited
0263-5577
DOI 10.1108/IMDS-07-2017-0317
Received 20 July 2017
Revised 17 November 2017
21 December 2017
Accepted 1 January 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0263-5577.htm
69
Business
environmental
analysis
directly analyzing textual data by utilizing their own workforce. These methods analyze
information by traditional text mining techniques such as clustering and concept linking
(Bose, 2008).
However, as big data environment arrives, these methods have limitations. First, they are
inefficient in terms of cost. Information obtained by market analysts is so expensive that
most enterprises, excluding conglomerates, cannot realistically use them. Even when
analyzing the data directly, it is difficult for most companies to afford experts for market
analysis. Second, these methods are not effective for processing large amounts of data. The
business environments surrounding enterprises are rapidly changing and the amount of
information is increasing. Traditional methodologies used by market analysts and
techniques for competitive intelligence require the manual works of experts (Bose, 2008).
Therefore, in a big data environment, it is difficult to analyze data using these methods
quickly and comprehensively.
While there have been previous studies on environmental analysis methodology for big data
(Dai et al., 2011; Dai et al., 2013), they have only proposed designs of the system, and did not
actually implement them. Therefore, they did not consider the specific methods and problems
encountered in actual system development, nor did they evaluate the system objectively.
Systems that automatically classify simple problems on a sentence-level rather than
business environmental analysis frameworks have been studied (Samejima et al., 2006;
Ur-Rahman and Harding, 2012; Arif-Uz-Zaman et al., 2016). However, these systems are not
fully automated and do not address the real-world problems such as lack of training data or
lack of features used for classification. To solve these problems, a successful business
environmental analysis methodology should meet the following conditions: it must be able
to retrieve meaningful data in big data environment as much as possible, and all the
processes, such as retrieving meaningful information in massive textual data and
classifying the data into categories, must be performed quickly and be fully automated.
In this paper, we propose a text mining methodology that identifies and automatically
classifies information about the business environment (contained in a large amount of
unstructured textual data) according to the categories of an environmental analysis framework.
We completely automate the process of selecting lexical information in the text as the basis for
the classification, so that the classification can be performed more quickly and economically. In
addition, we solve two technical problems of traditional classification systems using lexical
information. First, we use the semi-supervised learning technique so that we could effectively
train the classifier using less training data. Second, we considered the semantic similarity
between vocabularies by utilizing word embedding, a state-of-the-art text mining techniqu e. We
then implement our proposed system, and evaluate its performance so that we can prove the
practicality of the system. Additionally, we apply our proposed system to PEST and SWOT
analyses that are widely used in business environmental analysis frameworks, and we prove
the scalability of the system through various experiments. As shown in Figure 1, when the
sentences in the documents to be classified are entered into the system, the system classifies
and outputsthe sentences according to the category of theenvironmental analysis framework.
Such classified sentences can be useful for establishing corporate strategies and making
decisions in the future.
The remainder of this paper is organized as follows. In Section 2, we introduce our
business environmental analysis and described the limitations of traditional classification
systems used for business and industry. Section 3 provides the background knowledge
about text mining techniques such as semi-supervised learning and word embeddings.
Section 4 describes a fully automated business environmental system that utilizes data
mining during feature selection process, something for which previous studies has required
experts. We indicate problems such as data sparseness and the high cost of constructing
training data that traditional systems suffer when applying their systems to real
70
IMDS
119,1

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT