On the model design of integrated intelligent big data analytics systems

DOIhttps://doi.org/10.1108/IMDS-03-2015-0086
Date19 October 2015
Pages1666-1682
Published date19 October 2015
AuthorKun Chen,Xin Li,Huaiqing Wang
Subject MatterInformation & knowledge management,Information systems,Data management systems
On the model design of
integrated intelligent big data
analytics systems
Kun Chen
Department of Financial Mathematics and Engineering,
South University of Science and Technology, Shenzhen, China
Xin Li
Department of Information Systems, City University of Hong Kong,
Kowloon, Hong Kong, China, and
Huaiqing Wang
Department of Financial Mathematics and Engineering,
South University of Science and Technology, Shenzhen, China
Abstract
Purpose Althoughbig data analytics has reaped great business rewards,big data system design and
integrationstill facechallenges resultingfrom the demandingenvironment,including challengesinvolving
variety,uncertainty, and complexity. Thesecharacteristics in bigdata systems demand flexible and agile
integration architectures. Furthermore, a formal model is needed to support design and verification.
The purpose of this paper is to resolve the two pr oblems with a collective intelligence (CI) model.
Design/methodology/approach In the conceptual CI framework as proposed by Schut (2010), a CI
design should be comprised of a general model, which has formal form for verification and validation,
and also a specific model, which is an implementable system architecture. After analyzing the
requirements of system integration in big data environments, the authors apply the CI framework to
resolve the integration problem. In the model instantiation, the authors use multi-agent paradigm as
the specific model, and the hierarchical colored Petri Net (PN) as the general model.
Findings First, multi-agent paradigm is a good implementation for reuse and integration of big data
analytics modules in an agile and loosely coupled method. Second, the PN models provide effective
simulation results in the system design period. It gives advice on business process design and
workload balance control. Third, the CI framework provides an incrementally build and deployed
method for system integration. It is especially suitable to the dynamic data analytics environment.
These findings have both theoretical and managerial implications.
Originality/value In this paper, the authors propose a CI framework, which includes both practical
architectures and theoretical foundations, to solve the system integration problem in big data
environment. It provides a new point of view to dynamically integrate large-scale modules in an
organization. This paper also has practical suggestions for Chief Technical Officers, who want to
employ big data technologies in their companies.
Keywords Collective intelligence, System integration, Big data analytics, Model design
Paper type Research paper
1. Introduction
Imagine that an e-commerce company wants to develop an integrated big data analysis
system to support its business. Every day, 200 million people visit its web site and
make one million orders from a range of several millions of goods. In addition, sales
Industrial Management & Data
Systems
Vol. 115 No. 9, 2015
pp. 1666-1682
©Emerald Group Publishing Limited
0263-5577
DOI 10.1108/IMDS-03-2015-0086
Received 24 March 2015
Revised 6 August 2015
7 September 2015
Accepted 13 September 2015
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0263-5577.htm
This paper was supported by GRF grant (CityU 149412) from the Hong Kong Government,
SUSTC Fundamental Research Grant (FRG-SUSTC1501A-20), and the Shenzhen Research Grant
(No. JCYJ2010417105742712).
1666
IMDS
115,9
may increase sharply on holidays. The system must address various types of data,
such as customer orders, comments, and phone calls. It is also important to monitor
traffic and weather conditions in real time to ensure that deliveries are completed as
soon as possible. Moreover, the system should be able to immediately address various
potential emergency situations once relevant information is obtained. To this end, how
does one design a system in which all participants can work cooperatively? Actually ,
this is a typical scenario including several so-called big data analysis tasks. However,
we must go beyond simply applying Hadoop, MapReduce, and other big data
techniques to design an integrated system model.
The design of a big data system differs considerably from that of a traditional
database-supported decision support system (DSS) (Madden, 2012). Such a system
involves more entities, data, and participants; therefore, the system has special
requirements in terms of data management, model design, and quality of service (QoS),
especially in a multiple subsystem integration process (as shown in Table I).
Requirement 1: an integrated big data system should be working closely with
up/down value stream partners to achieve common goals through new ways of
organizing data to facilitate more effective decisions. Big data analytics address
large volumes and distributed aggregations of various types of data (OLeary, 2013).
The data may be from audio, video, social networks, or web forums. Big data no longer
relies on databases or data warehouses. No SQL methods and stream processing in
memory, are incorporated into the system. Therefore, integrating different data
management mechanisms is a considerable challenge.
Requirement 2: an integrated big data system should be adapting and modifying
key business processes and more quickly delivering applications. Big data analytics
models are not typically predefined due to the presence of dynamic environments. Such
models typically require iterative solutions for testing and improvement. Moreover,
business processes in big data analytics systems should be flexible (Talia, 2013).
Participants in such models include software systems, mobile devices, web services,
Database system Big data system
Data
management
Source Internal/defined External/various
Format Structured Structured/unstructured
Update Update periodically Change every second
Size Megabytes or gigabytes Petabytes or exabytes
Storage SQL-like DB SQL-like DB
No SQL system
Memory
Model design Analytic
model
Analysis models designed
against stable environment
Need to iteratively test/improve
models
Participants Software systems within
organization
Various software systems, devices,
web services, and, etc.
Business
process
Predefined business logic Dynamic business process built based
on distributed functional modules
Quality of
services
Time Normal response time Involving time-stamped events
Reliability Complete and reliable Incomplete and fuzzy
Security High Very high
Table I.
Database supported
vs big data
supported systems
1667
Big data
analytics
systems

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT