Mining the determinants of review helpfulness: a novel approach using intelligent feature engineering and explainable AI

DOIhttps://doi.org/10.1108/DTA-12-2021-0359
Published date17 March 2023
Date17 March 2023
Pages108-130
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
AuthorJiho Kim,Hanjun Lee,Hongchul Lee
Mining the determinants of review
helpfulness: a novel approach
using intelligent feature
engineering and explainable AI
Jiho Kim
School of Industrial Management Engineering, Korea University, Seoul,
Republic of Korea
Hanjun Lee
Department of Management Information Systems, Myongji University, Seoul,
Republic of Korea, and
Hongchul Lee
School of Industrial Management Engineering, Korea University, Seoul,
Republic of Korea
Abstract
Purpose This paper aims to nd determinants that can predict the helpfulness of online customer reviews
(OCRs) with a novel approach.
Design/methodology/approach The approach consists of feature engineering using various text
mining techniques including BERT and machine learning models that can classify OCRs according to
their potential helpfulness. Moreover, explainable articial intelligence methodologies are used to identify
the determinants for helpfulness.
Findings The important result is that the boosting-based ensemble model showed the highest prediction
performance. In addition, it was conrmed that the sentiment features of OCRs and the reputation of
reviewers are important determinants that augment the review helpfulness.
Research limitations/implications Each online community has dierent purposes, elds and
characteristics. Thus, the results of this study cannot be generalized. However, it is expected that this
novel approach can be integrated with any platform where online reviews are used.
Originality/value This paper incorporates feature engineering methodologies for online reviews,
including the latest methodology. It also includes novel techniques to contribute to ongoing research on
mining the determinants of review helpfulness.
Keywords Online customer reviews, Review helpfulness, Information extraction, Text mining, BERT,
Explainable articial intelligence
Paper type Research paper
1. Introduction
To reduce uncertainty before purchasing products and services, potential customers tend to
rely on the prior experience of those who with prior purchasing (Khorsand et al., 2020;
Ye et al., 2011). A recent Bright Local survey found that 89 per cent of potential consumers
read online customer reviews (OCRs) for decision-making and 91 per cent trust them as
personal recommendations (Murphy, 2019). In light of this, it is clear that information-
sharing on the web can have a signicant inuence on future purchasing decisions.
However, as more and more user communities have begun to post reviews and opinions
on the web and as the feature of posting reviews has expanded into an ever-growing range
This research was supported by Brain Korea 21 FOUR.
ThecurrentissueandfulltextarchiveofthisjournalisavailableonEmeraldInsightat:
https://www.emerald.com/insight/2514-9288.htm
108
Received 2 December 2021
Revised 6 April 2022
Accepted 30 May 2022
Data Technologies and
Applications
Vol. 57 No. 1, 2023
pp. 108-130
© Emerald Publishing Limited
2514-9288
DOI 10.1108/DTA-12-2021-0359
DTA
57,1
of business service areas, the problem of excessive information has been signicantly
escalated.
Many researchers have predicted the helpfulness of reviews generated online by users
and analyzed various features of the reviews. However, in many of these cases, only a few
features could be extracted for reviews made of unstructured data or only a limited
number of features were applied. In addition, several cases have found features that
augment the helpfulness of reviews by mainly applying simple regression models. This is
because it is dicult to interpret models with high predictive power such as machine
learning.
Ultimately, these studies failed to propose a comprehensive and convincing model for
predicting the potential helpfulness of a review and nding the determinants that make the
review helpful. Therefore, based on these requirements, this research started with the
following research questions (RQs):
RQ1. What are some of the methods for extracting features from online reviews
consisting of unstructured data and text, and which features make a review
helpful?
RQ2. Does segmenting the sentiment score with a pre-trained deep learning model
have a signicant impact on making reviews helpful?
RQ3. How to quantify the potential helpfulness of a review written in real time?
RQ4. Can identify helpfulness determinants in complex machine learning known as
a black box?
To answer these questions, feature engineering was carried out using previous related work
and the latest natural language processing (NLP) technology. In particular, in the sentiment
analysis, BERT, a pre-trained model, was applied to subdivide the sentiment score. After
extracting various features from the reviews, the authors trained seven machine learning
models, compared their predictive power and checked the potential value of unevaluated
reviews and the degree of review helpfulness. Finally, explainable articial intelligence
(XAI) was applied to models with high predictive power to explore the determinants of
helpful reviews.
The organization of the paper is as follows. Section 2 presents related works. The
present study contributed to the feature engineering method for review helpfulness and
the application of complex machine learning. Additionally, the authors attempt to propose
a novel method to identify determinant features based on XAI. Consequently, we show
previous related works and a summary table. Section 3 describes the research context and
data of the thesis in detail and shows methodologies proposed in the thesis. It illustrates
core concepts of methodologies and some results of feature engineering. Even more, the
interpretable machine learning methodologies are illustrated with theory and formulas.
Section 4 reveals the helpfulness prediction model results for the nal dataset including
the extracted features. Furthermore, it covers the determinant features through the
interpretation using XAI. Section 5 demonstrates the summary of the paper and key
ndings, implications and limitations. Consequently, the conclusion and future research
are dealt with in Section 6.
2. Related works
2.1 Online community platform and review helpfulness
In web 2.0 era, the scope and role of the online community platform are expanding.
E-commerce companies attract consumers to their online platform services by
Mining the
determinants
of review
helpfulness
109

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT