Optimising metadata to make high‐value content more accessible to Google users

DOIhttps://doi.org/10.1108/00220410610666484
Date01 May 2006
Pages307-327
Published date01 May 2006
AuthorAlan Dawson,Val Hamilton
Subject MatterInformation & knowledge management,Library & information science
Optimising metadata to make
high-value content more
accessible to Google users
Alan Dawson and Val Hamilton
Centre for Digital Library Research, University of Strathclyde, Glasgow, UK
Abstract
Purpose – This paper aims to show how information in digital collections that have been catalogued
using high-quality metadata can be retrieved more easily by users of search engines such as Google.
Design/methodology/approach The research and proposals described arose from an
investigation into the observed phenomenon that pages from the Glasgow Digital Library
(gdl.cdlr.strath.ac.uk) were regularly appearing near the top of Google search results shortly after
publication, without any deliberate effort to achieve this. The reasons for this phenomenon are now
well understood and are described in the second part of the paper. The first part provides context with
a review of the impact of Google and a summary of recent initiatives by commercial publishers to
make their content more visible to search engines.
Findings – The literature research provides firm evidence of a trend amongst publishers to ensure
that their online content is indexed by Google, in recognition of its popularity with internet users. The
practical research demonstrates how search engine accessibility can be compatible with use of
established collection management principles and high-quality metadata.
Originality/value – The concept of data shoogling is introduced, involving some simple techniques
for metadata optimisation. Details of its practical application are given, to illustrate how those working
in academic, cultural and public-sector organisations could make their digital collections more easily
accessible via search engines, without compromising any existing standards and practices.
Keywords Digital libraries,Search engines, Optimization techniques
Paper type Research paper
Introduction
The mission of Google Inc. is “to organise the world’s information and make it
universally accessible and useful”. The mission of the Dublin Core Metada ta Initiative
is “to make it easier to find resources using the internet”. Despite the similarity of
mission statements, the remarkable success of Google owes nothing to Dublin Core or
any other metadata scheme. This paper proposes some simple and practical measures
for bridging the gulf between the Google world and the metadata world in order to
make both more effective for information retrieval. The proposals described are based
on solid evidence of their success in practice, yet are also grounded in well-established
principles of information organisation and resource description.
In some quarters, metadata has been seen as the panacea for the problems of finding
information on the internet. During the late-1990s, in particular, around the time of the
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0022-0418.htm
The authors wish to thank Simon Bains at the National Library of Scotland for information
supplied, and colleagues at the Centre for Digital Library Research for their helpful comments on
earlier drafts.
Optimising
metadata
307
Received November 2004
Revised July 2005
Accepted July 2005
Journal of Documentation
Vol. 62 No. 3, 2006
pp. 307-327
qEmerald Group Publishing Limited
0022-0418
DOI 10.1108/00220410610666484
development of the Resource Description Framework and the Dublin Core, many
articles were written claiming that the chaos of the internet would soon be tamed once
web site developers started using such schemes. For example, Marchiori (1998) wrote:
The only feasible way to radically improve the situation is to add to web objects a metadata
classification, that is to say partially passing the task of classifying the content of web objects
from search engines and repositories to the users who are building and maintaining such
objects.
However, several years on, people do find information on the web, but not because of
well-structured, semantically useful metadata. As Arms and Arms (2004) have
observed:
With the benefit of hindsight, we now see that the web search engines have developed new
techniques and have adapted to a huge scale, while cross-domain metadata schemes have
made less progress.
Given the great investment of time and effort expended on discussion and development
of metadata schemes in recent years, it is not easy for the metadata community to
acknowledge the enormous impact of an information retrieval tool that does not rely on
metadata. There are many reasons for the success and public acceptance of the market
leader Google, but metadata is not one of them.
Google is great
The achievements of Google are often either taken for granted or not given due
acknowledgement in academic circles, so it is worth summarising them here. Google is
extremely fast and reliable, it works on a massive scale, it produces useful results much
of the time, it searches the full text of documents, it indexes multiple document types,
including HTML, PDF and Word documents, it generates contextual summaries that
are often useful, it is constantly updated, it has many advanced features for those wh o
can be bothered to use them, it is very simple to use, and it is entirely free to users
worldwide. It is an immensely valuable and widely used service that benefits from
regular extension and innovation. The development of effective search engines such as
Google has expanded easy access to information. Specialist training is no longer
required to achieve results of some sort, and many users put a premium on speed over
comprehensiveness and maybe even quality. As Wallis (2003) says:
For sufferers of “information anxiety” the simplicity of the Google search interface must
act as a calming tonic. It is not demanding of the information seeker in the formulation
of search terms and almost always produces vast numbers of hits. It even helps out with
your spelling.
Google’s current pre-eminence as a search engine is well documented. The verb
“to google” has entered the English la nguage, and in an extensive survey
(Brandchannel, 2004), Google was rated the world’s number one brand name, above
Apple, Mini, Coca-Cola, Samsung, Ikea and Nokia. When Google was unavailable for a
few hours on 26 July 2004, a spate of newspaper articles ensued, written by shocked
journalists. Dowling (2004), in The Guardian newspaper, described “A worrying
glimpse over the lip of a murky abyss” while Mangold (2004) had his “first near-death
experience”.
JDOC
62,3
308

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT