A novel self‐organising clustering model for time‐event documents

Pages260-272
DOIhttps://doi.org/10.1108/02640470810864145
Published date11 April 2008
Date11 April 2008
AuthorChihli Hung,Stefan Wermter
Subject MatterInformation & knowledge management,Library & information science
A novel self-organising clustering
model for time-event documents
Chihli Hung
Department of Management Information Systems,
Chung Yuan Christian University, Taiwan, and
Stefan Wermter
School of Computing and Technology, University of Sunderland, UK
Abstract
Purpose – The purpose of this paper is to examine neural document clustering techniques, e.g.
self-organising map (SOM) or growing neural gas (GNG), usually assume that textual information is
stationary on the quantity.
Design/methodology/approach – The authors propose a novel dynamic adaptive self-organising
hybrid (DASH) model, which adapts to time-event news collections not only to the neural topological
structure but also to its main parameters in a non-stationary environment. Based on features of a
time-event news collection in a non-stationary environment, they review the main current neural
clustering models. The main deficiency is a need of pre-definition of the thresholds of unit-growing and
unit-pruning. Thus, the dynamic adaptive self-organising hybrid (DASH) model is designed for a
non-stationary environment.
Findings – The paper compares DASH with SOM and GNG based on an artificial jumping corner
data set and a real world Reuters news collection. According to the experimental results, the DASH
model is more effective than SOM and GNG for time-event document clustering.
Practical implications A real world environment is dynamic. This paper provides an approach to
present news clustering in a non-stationary environment.
Originality/value – Text clustering in a non-stationary environment is a novel concept. The paper
demonstrates DASH, which can deal with a real world data set in a non-stationary environment.
Keywords Cluster analysis,Knowledge engineering, Knowledgemanagement, Information retrieval,
Neural nets
Paper type Research paper
Introduction
In the era of the internet, a vest amount of textual information can overwhelm users. By
grouping similar concepts of documents, an organised structure can quickly reduce the
search space and help users to access relevant documents (Van Rijsbergen, 1979).
Many document clustering approaches, including statistical solutions and artificial
neural networks, have been proposed for these tasks (e.g. Chang and Chen, 2006; Chen
and Chen, 2006; Hung et al., 2004; Pullwitt, 2002; Jain et al., 1999). Particularly in the
field of artificial neural networks, self-organising maps (SOMs) have been proposed for
document clustering (Kohonen, 1984). Documents containing a similar concept are
grouped into the same unit on a map and units representing a similar concept are
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0264-0473.htm
This work was partially supported by the National Science Council of Taiwan No. NSC
93-2416-H-237-002.
EL
26,2
260
Received 15 December 2006
Revised 21 February 2007
Accepted 16 April 2007
The Electronic Library
Vol. 26 No. 2, 2008
pp. 260-272
qEmerald Group Publishing Limited
0264-0473
DOI 10.1108/02640470810864145

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT