Predicting users’ demographic characteristics in a Chinese social media network

Date07 August 2017
DOIhttps://doi.org/10.1108/EL-09-2016-0203
Published date07 August 2017
Pages758-769
AuthorQiangbing Wang,Shutian Ma,Chengzhi Zhang
Subject MatterInformation & knowledge management,Information & communications technology,Internet
Predicting users’ demographic
characteristics in a Chinese social
media network
Qiangbing Wang and Shutian Ma
Department of Information Management,
Nanjing University of Science and Technology, Nanjing, China, and
Chengzhi Zhang
Department of Information Management, Nanjing University of Science and
Technology, Nanjing, China, Jiangsu Key Laboratory of Data Engineering and
Knowledge Service Nanjing University, Nanjing, China and
Fujian Provincial Key Laboratory of Information Processing and
Intelligent Control Minjiang University, Fuzhou, China
Abstract
Purpose Based on user-generated content from a Chinese social media platform, this paper aims to
investigate multiple methods of constructing user proles and their effectiveness in predicting their gender,
age and geographic location.
Design/methodology/approach This investigation collected 331,634 posts from 4,440 users of Sina
Weibo. The data were divided into two parts, for training and testing . First, a vector space model and topic
models were applied to construct user proles. A classication model was then learned by a support vector
machine according to the training data set. Finally, we used the classication model to predict users’ gender,
age and geographic location in the testing data set.
Findings The results revealed that in constructing user proles, latent semantic analysis performed better
on the task of predicting gender and age. By contrast, the method based on a traditional vector space model
worked better in making predictions regarding the geographic location. In the process of applying a topic
model to construct user proles, the authors found that different prediction tasks should use different numbers
of topics.
Originality/value This study explores different user prole construction methods to predict Chinese
social media network users’ gender, age and geographic location. The results of this paper will help to improve
the quality of personal information gathered from social media platforms, and thereby improve personalized
recommendation systems and personalized marketing.
Keywords Machine learning, Social media, Text classication, Users’ prole
Paper type Research paper
Introduction
With the popularization of social media around the world, exemplied by Facebook and
Twitter in the USA and Sina Weibo in China, the result has been a massive accumulation of
user-generated data. These data contain users’ personal information, original posts,
retweets, comments and other information. User-generated data contain several
This work was supported in part by Major Projects of National Social Science Fund (13&ZD174), the
Fundamental Research Funds for the Central Universities (No.30915011323) and the Opening
Foundation of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control
(Minjiang University).
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
EL
35,4
758
Received 30 September 2016
Revised 17 April 2017
Accepted 23 April 2017
TheElectronic Library
Vol.35 No. 4, 2017
pp.758-769
©Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-09-2016-0203

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT