RANKuser. A folksonomy and user profile based algorithm to identify experts in Community Question Answering sites

Date02 July 2018
Publication Date02 July 2018
AuthorAbhishek Kumar Singh,Naresh Kumar Nagwani,Sudhakar Pandey
SubjectLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
A folksonomy and user profile based algorithm
to identify experts in Community Question
Answering sites
Abhishek Kumar Singh, Naresh Kumar Nagwani and
Sudhakar Pandey
Department of Computer Science and Engineering,
National Institute of Technology, Raipur, India
Purpose Recently, with a high volume of users and users content in Community Question Answering
(CQA) sites, the quality of answers provided by users has raised a big concern. Finding the expert userscan
be a method to address this problem, which aims to find the suitable users (answerers) who can provide
high-quality relevant answers. The purpose of this paper is to find the expert users for the newly posted
questions of the CQA sites.
Design/methodology/approach In this paper, a new algorithm, RANKuser, is proposed for identifying
the expert users of CQA sites. The proposed RANKuser algorithm consists of three major stages. In the first
stage, folksonomy relation between users, tags, and queries is established. User profile attributes, namely,
reputation, tags, and badges, are also considered in folksonomy. In the second stage, expertise scores of the
user are calculated based on reputation, badges, and tags. Finally, in the third stage, the expert users are
identified by extracting top Nusers based on expertise score.
Findings In this work, with the help of proposed ranking algorithm, expert users are identified for newly
posted questions. In this paper, comparison of proposed user ranking algorithm (RANKuser) is also
performed with other existing ranking algorithms, namely, ML-KNN, rankSVM, LDA, STM CQARank, and
EV-based model using performance parameters such as hamming loss, accuracy, average precision, one error,
F-measure, and normalized discounted cumulative gain. The proposed ranking method is also compared to
the original ranking of CQA sites using the paired t-test. The experimental results demonstrate the
effectiveness of the proposed RANKuser algorithm in comparison with the existing ranking algorithms.
Originality/value This paper proposes and implements a new algorithm for expert user identification in
CQA sites. By utilizing the folksonomy in CQA sites and information of user profile, this algorithm identifies
the experts.
Keywords Folksonomy, Online social networking, Community Question Answering, Expert identification,
Ranking algorithms, Social text mining
Paper type Research paper
1. Introduction
The benefits of Community Question Answering (CQA) system have become popular in
recent times. Some popular CQA systems are Yahoo Answer (Answers.yahoo.com, 2017),
StackOverflow (StackOverflow, 2017), and Quora (Quora, 2017). The main objective of such
communities is to contribute to high-quality answers (Qu et al., 2009) and offer a wide
variety of solutions or explanations. Expert finding is a crucial problem in CQA sites
(Zhou et al., 2015). The main problem of CQA sites is the low involvement rate of the users.
There are two reasons for the low involvement rate of users. First, most users do not have
the willingness to answer the question, and second, users are not able to find the questions
related to their expertise or interest (Riahi et al., 2012). The essential problem of expert
finding is to choose the appropriate users for answering the questions, which has attracted
high attention by the researchers (Zhou et al., 2015; Riahi et al., 2012; Xu et al., 2012; Yang
et al., 2013; Pal et al., 2012; Yang and Manandhar, 2014; Lin et al., 2017).
In CQA sites, the asker whoposts the query needs to wait for other users replywhich is a
time takingprocess. Sometimes the repliesto the questions may also beincorrect or irrelevant.
Data Technologies and
Vol. 52 No. 3, 2018
pp. 329-350
© Emerald PublishingLimited
DOI 10.1108/DTA-10-2017-0080
Received 31 October 2017
Revised 15 January 2018
Accepted 9 February 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
The main reason behind this is that the questions of particular subject or area are not
properly linked to the relevant expert users. Therefore, it is necessary to identify
the expert users so that proper linking between questions and experts can be made.
An essential part of proposed work is to identify the expert users based on the folksonomy
and users profile.
Feature-based approaches for expert identification have been applied by several
researchers. A number of existing research studies focused on the problem of user
ranking, but none of the studies has considered the concept of folksonomy and users
profile for expert identification problem for the newly posted question. The main
difference between existing research and proposed work is to utilize the concept of
tagging data and users profile such as reputation and badges. Folksonomy involves tags
which do not require a lot of pre-processing. Folksonomy is developed in this work on
CQA sites attributes (Godoy et al., 2014; Min et al., 2012) and adopts the user profile
information for expert identification. Given a large number of users and questions on CQA
sites, folksonomy can be efficiently utilized for expert identification task (Godoy et al.,
2014; Min et al., 2012). A folksonomy is presented as a triplet F¼oU,P,T,YW,whereU
is the set of users, Pis set of posts, Tis set of tags, and Yis a relation between users, tags,
and questions. The question asked by users in CQA sites consists of title, body, and tags.
Thebodycontainsthequerybytheusersfollowed by the number of answers. Tags are
popular on these sites. The tag provides a metadata applied to the body of the questions.
User provides some tags in the question so that other users can easily find, identify, and
bookmark various questions (Rawashdeh et al., 2013; Schuster et al., 2013). In CQA sites,
users can earn reputation and badges depending upon their contributions. Users can also
earn reputation on their various activities such as like, dislike, accepted answer and
answers. In CQA sites, reputation score recognizes the contributors expertise (Bosu et al.,
2013). Work carried in this research use one of the famous CQA website databases,
StackOverflow (StackOverflow, 2017). In StackOverflow, users receive badges as a reward
for their actions. Users can earn gold, silver, or bronze badges. Gold badges are hard to
receive, while bronze badges are very easy to receive in StackOverflow.
The proposed algorithm (RANKuser) is based on folksonomy and user profile
attributes to calculate the users score in order to identify expert users. Usersscoresare
calculated in three parts, namely reputation score, tag-count score, and badge-count score.
Reputation score of a user can be calculated usingreputationearnedbyusers.Tag-count
score of the users is calculated using tags used by the users. Badge-count score of the
users is calculated using badges earned by the users. The performance of the proposed
algorithm (RANKuser) is compared with ML-KNN, rankSVM (Zhang and Zhou, 2007),
Latent Dirichlet Allocation (LDA), STM (Riahi et al., 2012), CQARank (Yang et al., 2013),
andEV-basedmodel(Palet al., 2012). Results are compared using six performance
metrics, namely hamming loss, accuracy, average precision, one error, F-measure, and
normalized discounted cumulative gain (nDCG). The proposed method is also compared
with the original ranking of the CQA sites using a paired t-test. Hamming loss is the
fraction of the labels that labeled incorrectly to the total number of labels. Accuracy
calculates subset accuracy, i.e. the set predicated labels for a sample match and the
corresponding set of actual labels. Average precision computes the average precision
from prediction score. This score shows the area under the precision-recall curve.
One error evaluates how many times the top-ranked label does not matches the set of
proper labels (Sadiq and Helen, 2016). F-measure is a method to test the accuracy of the
model (Pal et al., 2012). nDCG shows the usefulness of the user in the ranked users list
(Scikit-learn.org, 2017). The paired t-test is a method for comparing the mean of two
population where one sample can be paired with observations of other sample (Miyamoto
et al., 2017).

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT