Modelling online user behavior for medical knowledge learning

Publication Date14 May 2018
AuthorDaifeng Li,Andrew Madden,Chaochun Liu,Ying Ding,Liwei Qian,Enguo Zhou
SubjectInformation & knowledge management,Information systems,Data management systems,Knowledge management,Knowledge sharing,Management science & operations,Supply chain management,Supply chain information systems,Logistics,Quality management/systems
Modelling online user behavior for
medical knowledge learning
Daifeng Li
School of Information Management,
Sun Yat-Sen University, Guangzhou, China
Andrew Madden
Sun Yat-Sen University, Guangzhou, China
Chaochun Liu
Baidu Research Big Data Lab, Sunnyvale, California, USA
Ying Ding
Department of Information and Library Science,
Indiana University Bloomington, Indiana, USA
Liwei Qian
Baidu Inc, Beijing, China, and
Enguo Zhou
School of Information Management, Sun Yat-sen University, Guangzhou, China
Purpose Internet technology allows millions of people to find high quality medical resources online, with
the result that personal healthcare and medical services have become one of the fastest growing markets in
China. Data relating to healthcare search behavior may provide insights that could lead to better provision of
healthcare services. However, discrepancies often arise between terminologies derived from professional
medical domain knowledge and the more colloquial terms that users adopt when searching for information
about ailments. This can make it difficult to match healthcare queries with doctorskeywords in online
medical searches. The paper aims to discuss these issues.
Design/methodology/approach To help address this problem, the authors propose a transfer learning
using latent factor graph (TLLFG), which can learn the descriptions of ailments used in internet searches and
match them to the most appropriate formal medical keywords.
Findings Experiments show that the TLLFG outperforms competing algorithms in incorporating both
medicaldomain knowledge and patient-doctor Q&A datafrom online services intoa unified latent layer capable
of bridging the gap between lay enquiries andprofessionally expressed information sources, and makemore
accurate analysisof online userssymptom descriptions. The authorsconclude with a brief discussionof some
of the ways in which themodel may support online applicationsand connect offline medical services.
Practical implications The authors used an online medical searching application to verify the proposed
model. The model can bridge userslong-tailed description with doctorsformal medical keywords. Online
experiments show that TLLFG can significantly improve the searching experience of both users and medical
service providers compared with traditional machine learning methods. The research provides a helpful
example of the use of domain knowledge to optimize searching or recommendation experiences.
Originality/value The authors use transfer learning to map online userslong-tail queries onto medical
domain knowledge, significantly improving the relevance of queries and keywords in a search system reliant
on sponsored links.
Keywords Text mining, Machine learning, Factor graph, Latent variable, Medical knowledge,
Transfer learning
Paper type Research paper
1. Introduction
Accordingto a 2012 survey by the Pew Research Center,77 percent of people searching online
for health information start their search by using a search engine (Fox and Duggan, 2013).
In China, people are spending more and more time searching online for information about
Industrial Management & Data
Vol. 118 No. 4, 2018
pp. 889-911
© Emerald PublishingLimited
DOI 10.1108/IMDS-07-2017-0309
Received 17 July 2017
Revised 8 October 2017
Accepted 15 October 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
online user
healthcare resources. At a conference in 2016, Robin Li, the CEO of Baidu (the most widely
used search engine in Chin a), indicated that, ever y day, users carry out over
60 million searches for health-related information. Up to 40 percent of these searches are
related to diseasesymptoms, causes and treatments(Li, 2016). Searchers using genericsearch
sites will often be referred to medical websites. One such site that is popular in
China is the professional medical website, xywy (, which, since 2005, has
built up a collection of 300 million Q&A exchanges between online users and professional
doctors (Shiji Wenkang (Beijing) Technology Development Co., Ltd., 2017).
Appropriate analysis of such searches can produce insights into the online behavior of
people with medical concerns, and may help to improve healthcare services by facilitating
the development of tools that match patientsdescriptions of their symptoms to standard
medical terminologies.
The analysis of terms used online for describing ailments is especially important in
search engines such as Baidu and Google, which rely on advertising income. They provide
platforms that attempt to match online user needs to the most suitable business services.
This matching relies in part, on a set of keywords that describe the services offered.
However, usersonline queries conform to a follow a long-tailpattern (Anderson, 2004).
Statistically, long-tailed distributions are a subtype of heavy-tailed distributions
(Asmussen, 2003; Verzani, 2004). In such cases, most profits are derived from a handful of
frequently used search term, while the vast majority of searches are neglected.
Consequently, the development of resources and strategies that can sell to products within
the long-tail of searches are of considerable interest to marketers. Anderson (2004) suggest
that the share of the market within the long tail may exceed that made up of the relatively
few bestsellers.
The long-tailphenomenon is especially significantin the medical domain, becausepatients
often use differentterms to describe the same symptomsto doctor. According to the survey in
online searches, matters more than 70 percent of queries are long-tail(Huo, 2015).
Consequently, many queries a day (20 percent) are new to the search engine. This figure
supports a claim by Hill Web Marketing (2017) that 15 percent of the healthcare-related
queries submittedto Google each day have not previouslybeen recorded by the search engine.
Search engines use long-tail theory to price sponsored adverts by pricing search terms
within the long tail at a lower rate. However, because of their large number, keywords in the
long-tail can generate significant revenue.
However, although searches for medical information fit the long-tail pattern, the targets
of such searches are limited. Descriptions of medical disorders are far more varied than the
disorders themselves. Searchers usually want to find reliable information about diseases,
medical departments or treatment methods. The information is often linked to more
valuable keywords, which are bidden by competitive medical institutes. Our aim is to
analyze the ways in which users express their queries, and to map those queries onto
established medical terminology. The resulting strategy has the potential to help patients to
find high quality medical resources appropriate to their symptom descriptions; to help
medical institutes attract patients whose symptoms fit within their domain of expertise; and
to help search engines obtain higher click rate from those valuable keywords.
The practice of making advertising revenue dependent on successful matching of search
terms to advertised services increases the difficulty of identifying suitable matches because
bigger medical practices and better known doctors will pay a premium for the keywords
that attract the most traffic. Consequently, if a semantic method is used to match a
colloquially expressed query to medical keywords, users may receive misleading
recommendations, such as those in Figure 1.
Keyword matching is mainly based on the recognition of words that two sentences
have in common. For example, the query I have a fever, what should I do?is highly

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT