Network based model of social media big data predicts contagious disease diffusion

DOIhttps://doi.org/10.1108/IDD-05-2017-0046
Pages110-120
Published date21 August 2017
Date21 August 2017
AuthorLauren S. Elkin,Kamil Topal,Gurkan Bebek
Subject MatterLibrary & information science,Library & information services,Lending,Document delivery,Collection building & management,Stock revision,Consortia
Network based model of social media big
data predicts contagious disease diffusion
Lauren S. Elkin
Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA
Kamil Topal
Center for Proteomic and Bioinformatics, Department of Nutrition, Department of Electrical Engineering and Computer Science,
Case Western Reserve University, Cleveland, OH, USA, and
Gurkan Bebek
Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA
Abstract
Purpose – Predicting future outbreaks and understanding how they are spreading from location to location can improve patient care provided. Recently,
mining social media big data provided the ability to track patterns and trends across the world. This study aims to analyze social media micro-blogs and
geographical locations to understand how disease outbreaks spread over geographies and to enhance forecasting of future disease outbreaks.
Design/methodology/approach – In this paper, the authors use Twitter data as the social media data source, influenza-like illnesses (ILI) as
disease epidemic and states in the USA as geographical locations. They present a novel network-based model to make predictions about the spread
of diseases a week in advance utilizing social media big data.
Findings – The authors showed that flu-related tweets align well with ILI data from the Centers for Disease Control and Prevention (CDC) (
p
0.049). The authors compared this model to earlier approaches that utilized airline traffic, and showed that ILI activity estimates of their model were
more accurate. They also found that their disease diffusion model yielded accurate predictions for upcoming ILI activity (
p
0.04), and they predicted
the diffusion of flu across states based on geographical surroundings at 76 per cent accuracy. The equations and procedures can be translated to
apply to any social media data, other contagious diseases and geographies to mine large data sets.
Originality/value – First, while extensive work has been presented utilizing time-series analysis on single geographies, or post-analysis of highly
contagious diseases, no previous work has provided a generalized solution to identify how contagious diseases diffuse across geographies, such as states
in the USA. Secondly, due to nature of the social media data, various statistical models have been extensively used to address these problems.
Keywords Prediction, Big data analysis, Influenza dissemination, Information networks, Network model, Social media data
Paper type Research paper
1. Introduction and background
Approximately 5-20 per cent of US residents are affected by the
flu (influenza) annually (Services, 2015,2014) and 200,000 of
these are hospitalized for illnesses related to the flu (Centers for
Disease Control and Prevention, 2011). This number is
following an increasing trend (Thompson et al., 2004). If a
hospital is ill-prepared for a rush of patients from influenza-like
illnesses (ILI), this could poorly affect the care patients receive. It
is crucial for hospitals to obtain estimates pertaining to the
number of patients they will likely receive to maintain an
adequate supply level to care for each patient. We utilize data
collected from social media to address a novel problem, where we
investigate how diseases are spreading geographically.
Our contributions are twofold. First, while extensive work
has been presented utilizing time-series analysis on single
geographies (e.g. large cities, regions or countries) (Sadilek
et al., 2012;Hirose and Wang, 2012;Achrekar et al., 2011;
Broniatowski et al., 2013;Dredze et al., 2013;Salathe and
Khandelwal, 2011;Lampos and Cristianini, 2010;Lampos
et al., 2010;Lamb et al., 2013;Paul and Dredze, 2011)or
post-analysis of highly contagious diseases (Brockmann and
Helbing, 2013), no previous work has provided a generalized
solution to identify how contagious diseases diffuse across
geographies, such as states in the USA. Secondly, due to
nature of the social media data, various statistical models have
been extensively used to address these problems. In this study,
we present a new network-based approach to model disease
activity across geographical locations.
We use ILI as an example to make predictions about future
ILI cases spreading across the USA. With enhanced
forecasting about how outbreaks spread, we will improve upon
disease control and prevention efforts.
1.1 CDC influenza-like illness surveillance
Center for Disease Control and Prevention (CDC) maintains
one of the most-trustworthy data sources for influenza
The current issue and full text archive of this journal is available on
Emerald Insight at: www.emeraldinsight.com/2398-6247.htm
Information Discovery and Delivery
45/3 (2017) 110–120
© Emerald Publishing Limited [ISSN 2398-6247]
[DOI 10.1108/IDD-05-2017-0046]
The authors would like to thank David Wise for thoughtful discussions and
bringing this problem to our attention. This research was partially
supported by a Grant from NIH/NCRR CTSA KL2TR000440 to GB.
Received 1 May 2017
Revised 6 July 2017
Accepted 7 July 2017
110

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT