Exploring topics related to data mining on Wikipedia

Document

Cited in

Date	07 August 2017
Pages	667-688
DOI	https://doi.org/10.1108/EL-09-2016-0188
Published date	07 August 2017
Author	Yanyan Wang,Jin Zhang
Subject Matter	Information & knowledge management,Information & communications technology,Internet

Exploring topics related to data

mining on Wikipedia

Yanyan Wang and Jin Zhang

School of Information Studies, University of Wisconsin-Milwaukee,

Milwaukee, Wisconsin, USA

Abstract

Purpose –Data mining has been a popular research area in the past decades. Many researchers study

data-mining theories, methods, applications and trends; however, there are very few studies on

data-mining-related topics in social media. This paper aims to explore the topics related to data mining based

on the data collected from Wikipedia.

Design/methodology/approach –In total, 402 data-mining-related articles were obtained from

Wikipedia. These articles were manually classied into several categories by the coding method. Each

category formed an article-term matrix. These matrices were analysed and visualized by the self-organizing

map approach. Several clusters were observed in each category. Finally, the topics of these clusters were

extracted by content analysis.

Findings –The articles obtained were classied into six categories: applications, foundation and concepts,

methodologies, organizations, related elds and topics and technology support. Business, biology and

security were the three prominent topics of the applications category. The technologies supporting data

mining were software, systems, databases, programming languages and so forth. The general public was

more interested in data-mining organizations than the researchers. They also focused on the applications of

data mining in business more than in other elds.

Originality/value –This study will help researchers gain insight into the general public’s perceptions of

data mining and discover the gap between the general public and themselves. It will assist researchers in

nding new techniques and methods which will potentially provide them with new data-mining methods and

research topics.

Keywords Social media, Data mining, Social Web mining, Theme discovery

Paper type Research paper

Introduction

With the development of internet technologies, information and data are produced, shared,

and stored much faster than before. The volume of data grows every day as companies

capture large amounts of data about markets, products, customers and suppliers.

Individuals also receive large quantities of data from their daily life and the internet.

Moreover, the evolution of mobile devices, social media and Web technologies boosts the

growth of data and information. It is difcult, however, to deal with huge data sets using

traditional data analysis approaches. Because of these circumstances, the concept of data

mining was created.

To explore the internal relationship and patterns of data, data mining was proposed in the

1990s. Since then, data mining has been studied and used as a useful research method. As

more and more people face the problems of data analysis and management, this concept has

been widely accepted and the related techniques and methods have been frequently used by

both researchers and general users. Research topics about data mining can be found in a

large number of publications. In addition, there are introductions and discussions of data

mining on the internet, especially on social media platforms. Different from data mining

research studies, the content of data mining on social media platforms has its own features.

The current issue and full text archive of this journal is available on Emerald Insight at:

www.emeraldinsight.com/0264-0473.htm

Topics related

to data mining

on Wikipedia

667

Received 22 September 2016

Revised 16 March 2017

Accepted 9 April 2017

TheElectronic Library

Vol.35 No. 4, 2017

pp.667-688

©Emerald Publishing Limited

0264-0473

DOI 10.1108/EL-09-2016-0188

Since the use of data-mining theories, methods, and technologies continually increases, it

is necessary to gain insight into data-mining and related topics. Previous research papers

have studied various aspects of data mining, but few have explored the data-mining-related

topics based on data collected from social media. Because Wikipedia is the largest online

knowledge collaboration, to ll the gap, this study aims to explore the data-mining-related

topics on Wikipedia. The self-organizing map (SOM) approach, a machine learning

approach, was applied to this data analysis.

Literature review

Data mining

Data mining is a method to reveal previously unknown and reliable insights from large data

sets (Elkan, 2001). Because the massive volume of data from different elds keeps growing,

useful analysis methods and techniques are urgently needed. Therefore, data mining has

become an increasingly important research area (Liao et al., 2012).

With the development of data mining, a variety of methods and techniques from other

areas have been introduced to the data-mining area, such as classication, clustering and

database technology (Liao et al., 2012). In Han and Kamber’s (2006) book, they pointed out the

disciplines that most inuence and improve the data-mining method. These are statistics,

machine learning, database systems, warehousing and information retrieval. Meanwhile,

data mining has impacted other research elds, such as chemistry, medicine, business and so

forth (Aljumah et al., 2013;Borghini et al., 2010;Zhang et al., 2013).

In addition to prediction, data mining has other functions. Han and Kamber (2006)

summarized the different patterns that can be mined: frequent patterns, associations and

correlations; classication and regression; clustering analysis; and outlier analysis. Fu (2011)

gave a similar opinion on time series data mining, which said that the main tasks of time

series data mining are pattern discovery and clustering, classication, rule discovery and

summarization. Different data-mining methods and techniques have been proposed and

applied to accomplish different tasks. For example, k-means, fuzzy c-means and SOM are

frequently used in clustering analysis. Moreover, there are specic methods to mine certain

types of data, like the model-based sequence clustering methods for mining temporal data

(Law and Kwok, 2000).

Social Web mining

Web mining, as a branch of data mining, is gradually playing increasingly important roles in

research. Social Web mining is one of the primary components in studies related to Web

mining and social media. Social media is the way people generate, share and communicate

information in virtual communities and networks (Ahlqvist et al., 2008). Under the big

umbrella of social media, social media sites and applications vary a lot. For instance, Twitter,

which is regarded as a microblog, allows users to communicate and create posts of less than

140 characters (Kwak et al., 2010), while Wikipedia provides opportunities for collaborative

information and knowledge production (Bruns, 2006). With the development of mobile

devices, geo-mapping tools (e.g. Google Maps) and self-tracking applications (e.g. Quantied

Self) have been invented.

The methods applied in social Web mining can be classied into two groups: social

network analysis methods and sentiment analysis methods. Social network analysis tries to

reveal human relationships and connections (Hansen et al., 2010). In recent years, various

tools have been invented to analyse and visualize social networks, such as UCINET, Pajek,

NetworkX in Python and igraph in R (Borgatti et al., 2002;Kolaczyk and Csárdi, 2014;de

Nooy et al., 2011). Sentiment analysis is known as opinion mining, which is related to text

mining (Thelwall et al., 2011). Therefore, methods for text mining are also used in sentiment

35,4

668

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Exploring topics related to data mining on Wikipedia

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users