An intelligent approach to Big Data analytics for sustainable retail environment using Apriori-MapReduce framework

Pages1503-1520
Date14 August 2017
Published date14 August 2017
DOIhttps://doi.org/10.1108/IMDS-09-2016-0367
AuthorNeha Verma,Jatinder Singh
Subject MatterInformation & knowledge management,Information systems,Data management systems,Knowledge management,Knowledge sharing,Management science & operations,Supply chain management,Supply chain information systems,Logistics,Quality management/systems
An intelligent approach to Big
Data analytics for sustainable
retail environment using
Apriori-MapReduce framework
Neha Verma and Jatinder Singh
Department of Computer Science & Engineering,
I K Gujral Punjab Technical University, Kapurthala, India
Abstract
Purpose The purpose of this paper is to explore various limitations of conventional mining systems in
extracting useful buying patterns from retail transactional databases flooded with Big Data. The key
objective is to assist retail business owners to better understand the purchase needs of their customers and
hence to attract customers to physical retail stores away from competitor e-commerce websites.
Design/methodology/approach This paper employs a systematic and category-based review of relevant
literature to explore the challenges possessed by Big Data for retail industry followed by discussion and
implementation of association between MapReduce based Apriori association mining and Hadoop-based
intelligent cloud architecture.
Findings The findings reveal that conventional mining algorithms have not evolved to support Big Data
analysis as required by modern retail businesses. They require a lot of resources such as memory and
computational engines. This study aims to develop MR-Apriori algorithm in the form of IRM tool to address
all these issues in an efficient manner.
Research limitations/implications The paper suggests that a lot of research is yet to be donein market
basket analysis, if full potential of cloud-based Big Data framework is required to be utilized.
Originality/value This research arms the retail business owners with innovative IRM tool to easily extract
comprehensive knowledge of useful buying patterns of customers to increase profits. This study
experimentally verifies the effectiveness of proposed algorithm.
Keywords Big data, IRM tool, MapReduce Apriori algorithm, Market basket analysis, Retail analytics
Paper type Research paper
1. Introduction
Retail industry has entered in the era of Big Data. Retail transactional databases are
generating data with high emphasis on five Vs i.e. variety, volume, veracity, velocity, and
value. These databases are beyond the capabilities of traditional mining approaches to
process and manage. Moreover, exponential growth in e-commerce industry has also been
well evident. Most of e-commerce players like Amazon, Flip-kart, etc. are employing various
strategies to attract customers away from retail outlets in shopping malls or market to their
websites through lucrative offers such as cash back, cash on delivery, easy exchange, etc.
(Malhotra and Rishi, 2016a, b). In such an intense competitive scenario, retailers need to be
very careful to identify the various causes of problems faced by their customers and
understand the value of solving them in order to survive in his/her business, for instance, a
sports merchandise store may find that three out of five customers looking for a polo T-Shirt
cannot find their appropriate size on the shelf, which results in reduction of sales followed
by obvious profit loss, the retailers goal could be to reduce such out of stock instances and
thus increasing sales and profit. Retailers also need to understand the importance of regular
analysis of various trends on social media. For example, a popular cricketer posted his
photograph wearing a specific brand polo t-shirt on a social media platform. There is huge
probability of such a post going viral followed by run on the specific brand polo t-shirts and
ultimately lead to empty shelves on retail stores, which may require a long time to restock
Industrial Management & Data
Systems
Vol. 117 No. 7, 2017
pp. 1503-1520
© Emerald PublishingLimited
0263-5577
DOI 10.1108/IMDS-09-2016-0367
Received 11 September 2016
Revised 4 December 2016
29 March 2017
Accepted 5 April 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0263-5577.htm
1503
Apriori-
MapReduce
framework
due to insufficient presence of required inventory in supply chain. This research work aims
to focus on above mentioned frequently occurring issues faced by retail business owner.
These issues will lead to not only challenges but also opportunities for retailers who want
competitive advantage as retail industry has now access to more information that can be
utilized for wonderful shopping experiences and can assist retailer to maximize his/her
profit. Each and every transaction in a retail store is now stored for analysis of customers
purchase pattern and hence plays an important role in devising strategy for placement and
promotion of products to better satisfy customer and hence to increase revenues of retailer
(Verma et al., 2015). One of the popular strategies has been to use Apriori association mining
algorithm to find frequent item sets in retail databases (Verma and Singh, 2015). However,
traditional Apriori suffers from various limitations such as resource intensive nature as it
requiresmultiple scansof database as well asit is not capable enoughto extract uniquebuying
patterns from Big databases (Malhotra and Rishi, 2016a, b). The proposed research work
presents an intelligent HDFS, i.e. Hadoop Distributed File System (HDFS) and MapReduce
architecture based scalable, parallel next generation Apriori algorithm, i.e. MR-Apriori
algorithm. The overall objective of this research work is to design a system that can assist the
retail business owner to improve sales volume, reduce out of stock problem, increase profits,
reduce spoilage and improve customer purchase confidence from his/her store.
The proposed system provides various benefits to retailers such as:
improving sales by better understanding of customer buying patterns;
easy inventory management by timely obtaining accurate status of product pipeline;
identification of new sale opportunities through better analysis of customer
personalized profile and his/her purchase/browsing history;
minimum investment is required for implementation of proposed cloud framework
for distributed analysis of Big Data produced by customer transactions; and
proposedsystem would satisfyall important requisites expectedfrom modern Big Data
processing systems such as scalability, fault tolerance, partial failure support, etc.
The remaining part of the paper is organized as follows. Section 2, discusses various types
of conventional mining systems, category wise, under literature review. Section 3, discusses
detailed system design, MapReduce algorithms for parallel Apriori and association
rules extraction, system architecture followed by interface of IRM tool to assist the retailers.
Section 4 discusses experimental observations with graphical analysis. Section 5 concludes
the paper with detailed discussion and future work followed by important references.
2. Literature review
To the best of our knowledge as procured from literature, this research work is the first
formal attempt to design and develop an intelligent system for efficiently mining Big Data
stored in retail transactional databases for carrying out market basket analysis.
Conventional systems in literature may be easily discussed for better understanding
sub-dividing them into various categories, i.e. systems based on sequential mining, systems
based on distributed mining and systems based on Hadoop and MapReduce based
frameworks. These systems with their relative merits and demerits are category wise
discussed as follows.
2.1 Category 1: review of systems based on sequential mining
In this category different sequential algorithms on association rule mining (ARM) are
discussed. Yang (2012) recommended an algorithm to improve traditional Apriori algorithm,
which already suffers with amplified complexity and reduced efficiency. The suggested
1504
IMDS
117,7

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT