Extracting core questions in community question answering based on particle swarm optimization

Document

Cited in

Published date	03 September 2019
DOI	https://doi.org/10.1108/DTA-02-2019-0025
Date	03 September 2019
Pages	456-483
Author	Ming Li,Lisheng Chen,Yingcheng Xu
Subject Matter	Library & information science

Extracting core questions in

community question answering

based on particle

swarm optimization

Ming Li and Lisheng Chen

China University of Petroleum-Beijing, Beijing, China, and

Yingcheng Xu

China National Institute of Standardization, Beijing, China

Abstract

Purpose –A large number of questions are posted on community question answering (CQA) websites every

day. Providing a set of core questions will ease the question overload problem. These core questions should

cover the main content of the original question set. There should be low redundancy within the core questions

and a consistent distribution with the original question set. The paper aims to discuss these issues.

Design/methodology/approach –In the paper, a method named QueExt method forextracting core questions

is proposed. First, questions are modeled using a biterm topic model. Then, these questions are clustered based on

particle swarm optimization (PSO). With the clustering results, the number of core questions to be extracted from

each cluster can be determined. Afterwards, the multi-objective PSO algorithm is proposed to extract the core

questions. Both PSO algorithms are integrated with operators in genetic algorithms to avoid the local optimum.

Findings –Extensive experiments on real data collected from the famous CQA website Zhihu have been

conductedand the experimental results demonstrate the superior performance over otherbenchmark methods.

Research limitations/implications –Theproposed method providesnew insightinto and enriches research

on informationoverload in CQA.It performs better thanother methods in extracting core shorttext documents,

and thusprovides a betterway to extract core data.The PSO is a novel methodused for selectingcore questions.

The research on the application of the PSO model is expanded. The study also contributes to research on

PSO-based clustering. Withthe integration of K-means++, the key parameter number of clusters is optimized.

Originality/value –The novel core question extraction method in CQA is proposed, which provides a novel

and efficient way to alleviate the question overload. The PSO model is extended and novelty used in selecting

core questions. The PSO model is integrated with K-means++ method to optimize the number of clusters,

which is just the key parameter in text clustering based on PSO. It provides a new way to cluster texts.

Keywords Knowledge management, Social media, Particle swarm optimization, Text mining,

Community question answering, Core question extraction

Paper type Research paper

1. Introduction

With the rapid development of the internet, individuals or organizations acquire

increasingly more knowledge through the internet. Community question answering (CQA)

websites have become important knowledge-sharing platforms (Liu and Jansen, 2017). The

sites are online question–answering websites where users can post or answer questions

freely (Shah and Kitzie, 2012). Yahoo! Answers[1] and Zhihu[2] are popular CQA websites on

which users can ask questions via posted questions and share knowledge by answering

questions. Millions of questions have been posted and answered on these CQA websites.

Many users gain benefits from these CQA websites and increasingly more users participate

on the websites.

Data Technologies and

Applications

Vol. 53 No. 4, 2019

pp. 456-483

2514-9288

DOI 10.1108/DTA-02-2019-0025

Received 12 February 2019

Revised 9 May 2019

6 August 2019

Accepted 28 August 2019

The current issue and full text archive of this journal is available on Emerald Insight at:

www.emeraldinsight.com/2514-9288.htm

The authors declare no conflict of interest. The research is supported by the National Natural Science

Foundation of China (Grant No. 71571191), the Humanity and Social Science Youth Foundation of the

Ministry of Education in China (Grant No. 15YJCZH081) and National Natural Science Foundation of

China (Grant No. 91646122).

456

DTA

53,4

With the increasing number of community users, the number of posted questions is becoming

increasingly larger. Question overload decreases the findings of questions both for knowledge

seekers and for question answerers (Li and King, 2010). Many studies have been conducted to

support the findings of questions. For example, to help answerers find relevant unanswered

questions, the routing method has been proposed. Unanswered questions are routed to the

answerers who probably can give answers to these questions according to the answerers’

expertise (Cheng et al., 2017; Li et al., 2011; Srba et al., 2015; Zhao et al., 2015). Question search

refers to finding the questions that are related to the query (Cao et al., 2008). The main goal is to

bridge the gap between queries and existing questions (Cai et al., 2017; Wang et al., 2018;

Wu et al., 2014; Zhang et al., 2014). Question search is often used to help knowledge seekers who

are confused with the huge number of questions find appropriate questions.

Although these works have made great progress in overcoming question overload

problems, there still some problems to be resolved. These methods focus on ranking

questions according to the needs that are represented by evaluation functions. However,

there are manly similar or even duplicate posts in CQA because of its openness (Singh et al.,

2018). Especially, questions that are ranked first are often quite similar, and users’needs can

only be partially satisfied by concentrating on a small part. Other important parts will be

missed. Then, there arises the requirement of maximally meeting one’s needs with a small

set of core questions. Some methods have been proposed to extract a subset of data to

represent the whole data set (Ma et al., 2011; Zhang et al., 2016). In these methods, texts are

modeled by the TF-IDF method, which performs well on long formal documents. However,

the performance of the TF-IDF method is affected when dealing with short texts because of

the sparsity of short texts (Cheng et al., 2014). Since most questions in CQA are short, these

extraction methods are not suitable for extracting questions. Moreover, these methods are

based on the greedy algorithm, which extract documents one by one. The most other better

combinations cannot be found. The effects of the extraction are affected.

To resolve above problems, in this paper, the method named QueExt for extracting core

questions on CQA websites are proposed. First, the questions are modeled using the biterm

topic model (BTM), which fits better for short text modeling. Then, questions with similar topics

are automatically clustered. The clustering is modeled as a single object optimization problem,

which is resolved using particle swarm optimization (PSO). With the clustering results, the

number of core questions that needs to be extracted in each cluster is determined. Afterwards,

the core questions are extracted from each cluster according to the cluster size. The extraction is

novelly modeled as a multiple-object optimization problem, which is also resolved using PSO.

To avoid the local optimum, both PSO algorithms are integrated with operators in genetic

algorithms. Finally, the experiments show the better performance of the proposed method.

In the following section, studies on CQA, PSO and multiple objective optimization are

presented. Section 3 gives the clustering method and introduces the core question extraction

method. In Section 4, the experiments are given in detail. We conclude in Section 5 with

future work.

2. Related works

2.1 Community question answering

CQA systems are a typical Web 2.0 knowledge sharing application (Srba and Bielikova,

2015). This type of application provides a community service where users post and answer

questions. Knowledge seekers need their questions to be answered to obtain the knowledge

that they seek from the corresponding answers. Experts or professional users need there to

be suitable unanswered questions to share their knowledge. As the number of posted

questions rapidly increases through time, the massive number of questions leads to the

problem of information overload. Then, we must discover how to find the required questions

efficiently to improve the effectiveness of knowledge sharing.

457

Extracting

core questions

in CQA

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Extracting core questions in community question answering based on particle swarm optimization

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users