Characters-based sentiment identification method for short and informal Chinese text

Document

Cited in

Date	19 February 2018
Pages	57-66
DOI	https://doi.org/10.1108/IDD-05-2017-0047
Published date	19 February 2018
Author	Qiujun Lan,Haojie Ma,Gang Li
Subject Matter	Library & information science,Library & information services,Lending,Document delivery,Collection building & management,Stock revision,Consortia

Characters-based sentiment identiﬁcation

method for short and informal Chinese text

Qiujun Lan and Haojie Ma

Business School, Hunan University, Changsha, China, and

Gang Li

School of Information Technology, Deakin University, Melbourne, Australia

Abstract

Purpose –Sentiment identiﬁcation of Chinese text faces many challenges, such as requiring complex preprocessing steps, preparing various word

dictionaries carefully and dealing with a lot of informal expressions, which lead to high computational complexity.

Design/methodology/approach –A method based on Chinese characters instead of words is proposed. This method represents the text int o a

ﬁxed length vector and introduces the chi-square statistic to measure the categorical sentiment score of a Chinese character. Based on these, the

sentiment identiﬁcation could be accomplished through four main steps.

Findings –Experiments on corpus with various themes indicate that the performance of proposed method is a little bit worse than existi ng Chinese

words-based methods on most texts, but with improved performance on short and informal texts. Especially, the computation complexity of the

proposed method is far better than words-based methods.

Originality/value –The proposed method exploits the property of Chinese characters being a linguistic unit with semantic information. Contrasting

to word-based methods, the computational efﬁciency of this method is signiﬁcantly improved at slight loss of accuracy. It is more sen tentious and

cuts off the problems resulted from preparing predeﬁned dictionaries and various data preprocessing.

Keywords Information technology, Text mining, Data mining, Chinese character, Sentiment identiﬁcation, Short text

Paper type Research paper

1. Introduction

With the rapid development of the internet, and the advent of

Web2.0, text sentiment identiﬁcation has been a hot research

area with a range of applications in market intelligence,

recommendationsystem and social public feelings analysis (Xia

et al.,2010;Zhong and Deng, 2012;Zhang et al., 2010;Tang

et al., 2007;Manek et al.,2017;David et al, 2016). As a

burgeoning technology, text sentiment identiﬁcation can

automatically analyze documentsfrom the huge and expansive

text information, providing convenience for commodity

evaluation, public opinion control and investor sentiment

research.

Chinese, one ofﬁcial languages of the United Nations, is

widely used and with a long history. According to the report

released by UN Broadband Commission,by 2015, the number

of internet users in Chinese had exceed thenumber of users in

English. English is a phonic and alphabetic language, while

Chinese is ideographic and written in graphic characters. Text

sentiment identiﬁcationsteps between Chinese and English are

very different. The biggest one is in segmentation. English

segmentation can be divided into three parts (text splitting,

removing stop word and stemming). As making up of words, it

is easy to divide English sentences just using spaces. Chinese,

consists of characters, needs more complex splitting method.

Furthermore, there are often ambiguities in Chinese text

segmentation. It has become a big challenge in Chinese

segmentation. Scholars from various regions/countries such as

Taiwan, Singapore, Hong Kong and Japan, as well as from

Mainland China, are interested in Chinese information

processing technologies and the related text sentiment

identiﬁcation technology (Chou et al.,2015;Zagibalov and

Carroll, 2008;Huand Chen, 2016).

Nowadays, e-commerce is developing rapidly. More

comment text is generated. Customers prefer to share their

opinions in BBS, microblog, etc. In the ﬁnancial ﬁeld, some

researchers construct investor sentiment index using ﬁnancial

forum user’s reviews (Yi et al.,2016). As a part of behavioral

ﬁnance, it can apply to high frequency trading. Sun, Najand

and Shen explore the predictive relation between high-

frequency investor and stock marketreturns (Sun et al.,2016).

They found substantial evidence that intraday S&P 500 index

returns are predictable using lagged half-hour investor

sentiment. However, the reviewsare short, informal, ﬁlled with

many buzzwords, slang, typos, etc. After preprocessing, it is

features sparsely and information scantily. The traditional

sentiment classiﬁcation methods do not perform well in these

The current issue and full text archive of this journal is available on

Emerald Insight at: www.emeraldinsight.com/2398-6247.htm

Information Discovery and Delivery

46/1 (2018) 57–66

[DOI 10.1108/IDD-05-2017-0047]

The Research Sponsored by Natural Science Foundation of China (Grant

No. 71171076) and the key project of National Natural Science Fund of

China (Grant No. 71431008).

Received 2 May 2017

Revised 4 December 2017

Accepted 5 December 2017

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Characters-based sentiment identification method for short and informal Chinese text

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users