“Less is more”. Mining useful features from Twitter user profiles for Twitter user classification in the public health domain

Document

Cited in

Published date	17 December 2019
DOI	https://doi.org/10.1108/OIR-05-2019-0143
Pages	213-237
Date	17 December 2019
Author	Ziqi Zhang,Georgica Bors
Subject Matter	Library & information science,Information behaviour & retrieval,Collection building & management,Bibliometrics,Databases,Information & knowledge management,Information & communications technology,Internet,Records management & preservation,Document management

“Less is more”

Mining useful features from Twitter

user profiles for Twitter user classification

in the public health domain

Ziqi Zhang

Information School, University of Sheffield, Sheffield, UK, and

Georgica Bors

Department of Computer Science, University of Sheffield, Sheffield, UK

Abstract

Purpose –This work studies automated user classification on Twitter in the public health domain, a task

that is essential to many public health-related research works on social media but has not been addressed.

The purpose of this paper is to obtain empirical knowledge on how to optimise the classifier performance on

this task.

Design/methodology/approach –A sample of 3,100 Twitter users who tweeted about different health

conditions were manually coded into six most common stakeholders. The authors propose new, simple

features extracted from the short Twitter profiles of these users, and compare a large set of classification

models (including state-of-the-art) that use more complex features and with different algorithms on this

data set.

Findings –The authors show that us er classification in t he public health domain is a very challenging

task, as the best result th e authors can obtain on this d ata set is only 59 per cent in ter ms of F1 score.

Compared to state-of-t he-art, the methods can obtain significa ntly better (10 percentage points in F1 on a

“best-against-best ”basis) results when usin g only a small set of 40 featur es extracted from the sho rt

Twitter user profile text s.

Originality/value –The work is the first to study the different types of users that engage in health-related

communication on social media, applicable to a broad range of health conditions rather than specific ones

studied in the previous work. The methods are implemented as open source tools, and together with data, are

the first of this kind. The authors believe these will encourage future research to further improve this

important task.

Keywords Social media, Machine learning, Twitter, Public health, Data science

Paper type Research paper

1. Introduction

In recent years, social media platforms such as discussion forums and social networks

have been growing rapidly as a channel for the communication and engagement of public

health-related matters. Among these, Twitter has become the most commonly used platform

for this purpose (Thackeray et al., 2012), due to its support for real-time dissemination of

information and personal opinions. Twitter is a social networking and microblogging

platform where users post and interact with messages, or “tweets”. It enables its users to

engage in effective and real-time information sharing and dialogic relationship building

with each other (Park et al., 2016). It offers interactive features such as the ability to “follow”

users to form networks, retweet (i.e. republish and reshare), quote, like and reply to tweets,

and to embed rich media including hyperlinks, multimedia, hashtags (a notion of “topic”)

as well as symbols within tweets.

Due to the potential of Twitter to provide insight into public views and opinions related

to health and the ability to retrieve data at little cost, it has become a valuable resource for

Online Information Review

Vol. 44 No. 1, 2020

pp. 213-237

1468-4527

DOI 10.1108/OIR-05-2019-0143

Received 1 May 2019

Revised 21 September 2019

Accepted 19 November 2019

The current issue and full text archive of this journal is available on Emerald Insight at:

https://www.emerald.com/insight/1468-4527.htm

This paper does not contain any studies with human participants performed by any of the authors

(nevertheless data collection from Twitter was still subject to the authors’institution’s internal

ethical approval).

213

Mining useful

features from

Twitter user

profiles

research (Moorhead et al., 2013). Currently, research based on Twitter in the health domain

can be generally divided into two types: one that studies health-related content shared on

Twitter, and the other studies users who engage with such content.

The majority of previous work belongs to the research of content analysis. This covers

work that apply data mining to discover novel patterns that predict future events such

as disease outbreak (Szomszor et al., 2012), or enhance our existing knowledge such as

pharmacovigilance (Ginn et al., 2014); studies that analyse the nature (e.g. content, quantity)

of information sharing concerning particular health conditions on Twitter (Thackeray et al.,

2012; Tsuya et al., 2014; Rosenkrantz et al., 2016); and research that aims to understand the

impact of such shared content in terms of engaging audience and growing communities

(Ferguson et al., 2014; Singh and John, 2015; Brady et al., 2017; Rabarison et al., 2017).

In contrast, work on user analysis in the health domain is very limited. This typically

involves user profiling based on demographic characteristics or interests. We argue that

this is an equally important area since the identification and characterisation of different

user types allow us to understand dominant or emerging topics, influential users, the

composition of a community and the information exchange patterns therein. Such

knowledge will allow us to better connect information seekers with providers, which will be

of key interest to public health stakeholders. For example, public health agencies and

healthcare providers can better target their audience for the promotion of information and

services; information seekers and service users can better find credible information to fulfil

their informational needs. While there exists a wealth of literature on social media user

profiling in general, these are limited to either non-health context (Tinati et al., 2012; Uddin

et al., 2014), or specific health-related issues such as smokers and drug users (Kim et al.,

2017; Kursuncu et al., 2018). Methods and findings from these studies are ad hoc and not

directly applicable to the general public health domain.

In this work, we study the empirical task of automatically classifying Twitter users that

engage in health-related information sharing, using natural language processing (NLP) and

machine learning techniques. We refer to the different types of users as stakeholders,

representing different interests and information needs. Our contributions are empirical and

include: the first study on the automatic user classification in the general public health

domain, while previous work only tackled single health conditions where the classification

schemes are non-applicable to other problems. We propose a generic classification scheme,

release both our code and data to foster further research in this area. Second, a comparative

analysis of the popular machine learning algorithms and features used for social media user

classification on this specific task. We show that empirically, this is a very challenging task,

as many well-established methods in other domains are shown to obtain only mediocre

results. Third, a new method to capture useful features based on the short Twitter profile

texts of different stakeholders. Compared to state-of-the-art, such features are easier to

extract, and shown to be significantly more effective on this specific task. As one of

our models using only 40 features has significantly outperformed the best performing

state-of-the-art (10 percentage points) that uses thousands of features extracted by complex

processes (e.g. topic modelling) from tweets, as well as additional corpora.

The rest of the paper is organised as follows. Section 2 presents a brief literature review.

Section 3 describes our methodology in detail. This is followed by Section 4 that presents

and discusses the results. Then, Section 5 discusses the limitations of this work, and

Section 6 concludes this paper with future research directions.

2. Background

We first discuss literature in the context of public health-related communication on Twitter.

This includes studies of both content analysis (Section 2.1) and user analysis (Section 2.2).

We then review related work from a methodological point of view, to cover automated user

214

OIR

44,1

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

“Less is more”. Mining useful features from Twitter user profiles for Twitter user classification in the public health domain

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users