Topological and topical characterisation of Twitter user communities

Pages482-501
DOIhttps://doi.org/10.1108/DTA-01-2018-0006
Date04 September 2018
Published date04 September 2018
AuthorGuillaume Gadek,Alexandre Pauchet,Nicolas Malandain,Laurent Vercouter,Khaled Khelif,Stéphan Brunessaux,Bruno Grilhères
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
Topological and topical
characterisation of Twitter
user communities
Guillaume Gadek
Normandie Université, LITIS, INSA de Rouen Normandie, Rouen, France and
Airbus, Élancourt, France
Alexandre Pauchet, Nicolas Malandain and Laurent Vercouter
Normandie Université, LITIS, INSA de Rouen Normandie,
Rouen, France, and
Khaled Khelif, Stéphan Brunessaux and Bruno Grilhères
Airbus, Élancourt, France
Abstract
Purpose Most of the existing literature on online social networks (OSNs) either focuses on community
detection in graphs without considering the topic of the messages exchanged, or concentrates exclusively on
the messages without taking into account the social links. The purpose of this paper is to characterise the
semantic cohesion of such groups through the introduction of new measures.
Design/methodology/approach A theoretical model for soci al links and salient topics on Twitter is
proposed. Also, meas ures to evaluate the topi cal cohesiveness of a gr oup are introduced. In spired from
precision and recall, the proposed measures, called expertise and representativeness, assess how a set of
groups match the topic d istribution. An adapt ed measure is also introdu ced when a topic simila rity
can be computed. Fina lly, a topic relevanc e measure is defined, si milar to tf.idf (term -frequency, inverse
document frequency).
Findings The measures yield interesting results, notably on a large tweet corpus: the metrics accurately
describe the topics discussed in the tweets and enable to identify topic-focused groups. Combined with
topological measures, they provide a global and concise view of the detected groups.
Originality/value Many algorithms, applied on OSN, detect communities which often lack of meaning and
internal semantic cohesion. This paper is among the first to quantify this aspect, and more precisely the
topical cohesion and topical relevance of a group. Moreover, the proposed indicators can be exploited for
social media monitoring, to investigate the impact of a group of people: for instance, they could be used for
journalism, marketing and security purposes.
Keywords Social network, Twitter, Community detection, Topic detection, Social groups, Graph
Paper type Research paper
1. Introduction
OSNs have taken a huge place in the media and in our lives. The large volume, easy access
and rapid propagation of online information make the social networks a perfect example of
social interaction between millions of individuals.
To characterise a network, OSN analysis often focuses on high-level coarse-grained
characteristics of the network (e.g. degree repartition, statistics, histograms, etc.). These
statistics depict the variety of networks: about Twitter (Kwak et al., 2010), Google+(Magno
et al., 2012) or Facebook (Wilson et al., 2009); or more recently about the Chinese Sina Weibo
(Gao et al., 2012) or the Russian Vkontakte (Kinash et al., 2015). Similarly, the information
Data Technologies and
Applications
Vol. 52 No. 4, 2018
pp. 482-501
© Emerald PublishingLimited
2514-9288
DOI 10.1108/DTA-01-2018-0006
Received 12 January 2018
Revised 19 April 2018
Accepted 21 June 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2514-9288.htm
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted withoutfee provided that copies are not madeor distributed for profit or commercialadvantage
and that copies bear this notice and the full citation on the first page. Copyrights for third-party
components of this work must be honored. For all other uses, contact the owner/author(s).
482
DTA
52,4
propagation domain usually focuses either on the information extraction or on the
propagation process (Cholvy, 2016; Lagnier et al., 2013), without looking thoroughly at the
transmitted information itself. In data analytics, the communities frequently depend on the
quantitative values (e.g. communities on Twitter are often detected using the follow relation
between people; sometimes interaction between users is not even considered). We are
convinced that existing communities on OSN are topically dependent. In other words,
detecting accurate communities should take into account the relations between the users but
also the topic of the exchanged messages.
Among existing OSN, Twitter has emerged as the reference in micro-blogging
platforms with over 500m public messages per day. On the Twitter network, users can
follow each other (new messages are made visible in the timeline), but they can also
retweet one another, that is, cite a message. One can have a public discussion, replying to
the others messages. A common way to obtain some attention is to mention someone. All
these actions can be automatically collected from Twitter through its public, rate-limited
APIs (search and stream).
Many types of communities exist, depending on the application: topical, social, spatial
or even based on the language. In this paper, groups or user communities are defined such
that strong social ties exist between their members, and the members share a common
interest in a topic. More in detail, we model a well-known OSN, Twitter, taking into
account its technical features (among which retweets[1] and replies) as well as real-life
concerns (common interest in a group, strong social links between members). We propose
a metric of Expertise, denoted ξ, to evaluate the proportion of a group active on a same
shared topic; and a metric of representativeness, denoted ρ, to quantify whether a group
represents the whole population active on itstopic. These measures aim to improve the
analysis of the detected communities, taking into account the contents produced by the
members of the communities. If a topic similarity can be computed between two centres of
interest, an adapted version of the Expertise, the ξ
2
measure, is also provided in
replacement of ξ. Inspired from the tf.idf [2], used in natural language processing (NLP) to
find the most pertinent words in a document, we introduce a θf·igf value, standing for
topic frequency inverse group frequency, to measure the pertinence of a topic θin a group
g. These scores enable a social network analyst to improve her insights about the
detected communities.
The proposed model is then applied on real data: a corpus of a few millions of tweets. The
first step consists of the analysis of the texts of the tweets to extract salient topics.
The second step is the detection of user communities from the graphs of interactions
between the users, relying on a state-of-the-art algorithm. Finally, the third step is the
evaluation of the semantic cohesion of the groups.
Section 2 presents the related work in community detection as well as OSN analytics and
topological measures of the quality of detected communities. In Section 3 the proposed
model, method and metrics are exposed. Section 4 describes the technical details about the
experiment, shows the results obtained and highlights some elements to discuss. Finally,
Section 5 concludes this paper.
2. Related work
This section presents some of the related work for each of the identified subtasks: first
with a review of social network analysis in general, then with a focus on the community
detection task. Further on, the algorithms and measures are described when the problem
is formalised with graphs. Next a review of some text analysis tools for topic detection
on user generated contents is proposed. Finally, a discussion aims to emphasise the
contribution of this paper.
483
Twitter user
communities

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT