Predicting users’ demographic characteristics in a Chinese social media network
Date | 07 August 2017 |
DOI | https://doi.org/10.1108/EL-09-2016-0203 |
Published date | 07 August 2017 |
Pages | 758-769 |
Author | Qiangbing Wang,Shutian Ma,Chengzhi Zhang |
Subject Matter | Information & knowledge management,Information & communications technology,Internet |
Predicting users’ demographic
characteristics in a Chinese social
media network
Qiangbing Wang and Shutian Ma
Department of Information Management,
Nanjing University of Science and Technology, Nanjing, China, and
Chengzhi Zhang
Department of Information Management, Nanjing University of Science and
Technology, Nanjing, China, Jiangsu Key Laboratory of Data Engineering and
Knowledge Service Nanjing University, Nanjing, China and
Fujian Provincial Key Laboratory of Information Processing and
Intelligent Control Minjiang University, Fuzhou, China
Abstract
Purpose –Based on user-generated content from a Chinese social media platform, this paper aims to
investigate multiple methods of constructing user proles and their effectiveness in predicting their gender,
age and geographic location.
Design/methodology/approach –This investigation collected 331,634 posts from 4,440 users of Sina
Weibo. The data were divided into two parts, for training and testing . First, a vector space model and topic
models were applied to construct user proles. A classication model was then learned by a support vector
machine according to the training data set. Finally, we used the classication model to predict users’ gender,
age and geographic location in the testing data set.
Findings –The results revealed that in constructing user proles, latent semantic analysis performed better
on the task of predicting gender and age. By contrast, the method based on a traditional vector space model
worked better in making predictions regarding the geographic location. In the process of applying a topic
model to construct user proles, the authors found that different prediction tasks should use different numbers
of topics.
Originality/value –This study explores different user prole construction methods to predict Chinese
social media network users’ gender, age and geographic location. The results of this paper will help to improve
the quality of personal information gathered from social media platforms, and thereby improve personalized
recommendation systems and personalized marketing.
Keywords Machine learning, Social media, Text classication, Users’ prole
Paper type Research paper
Introduction
With the popularization of social media around the world, exemplied by Facebook and
Twitter in the USA and Sina Weibo in China, the result has been a massive accumulation of
user-generated data. These data contain users’ personal information, original posts,
retweets, comments and other information. User-generated data contain several
This work was supported in part by Major Projects of National Social Science Fund (13&ZD174), the
Fundamental Research Funds for the Central Universities (No.30915011323) and the Opening
Foundation of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control
(Minjiang University).
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
EL
35,4
758
Received 30 September 2016
Revised 17 April 2017
Accepted 23 April 2017
TheElectronic Library
Vol.35 No. 4, 2017
pp.758-769
©Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-09-2016-0203
To continue reading
Request your trial